Welcome to MilkyWay@home

tasks being sent to wrong gpu card

Message boards : Number crunching : tasks being sent to wrong gpu card
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Joseph Stateson
Avatar

Send message
Joined: 18 Nov 08
Posts: 291
Credit: 2,461,693,501
RAC: 0
Message 48942 - Posted: 24 May 2011, 12:53:35 UTC
Last modified: 24 May 2011, 13:52:39 UTC

Is there any way to direct milkyway tasks to my gtx460 and avoid the 9800gtx card which is single precision? I had 3 tasks fail within seconds because they were assigned the wrong card. Here is one of them. The Computer ID shows a pair of gtx460 but that is not true, there is only one and the other is 9800gtx. This combination works fine for primegrid and collatz but not milkyway.

I recently added some single slot gtx460s to my systems and now have double precision capability.

[EDIT] possible the gtx460 is not fully double double precision. I just read where it is either 1/6 or 1/12 as capable as other cards haveing DP.
ID: 48942 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Avatar

Send message
Joined: 1 Sep 08
Posts: 204
Credit: 219,354,537
RAC: 0
Message 48952 - Posted: 24 May 2011, 20:41:01 UTC - in response to Message 48942.  
Last modified: 24 May 2011, 20:41:54 UTC

Any nVidia except the current Fermi based Teslas (and maybe Quadros) are as "not fully double precision" as your GTX460. The chip (and all other mainstream Fermis) can run dp at 1/12th the sp speed. Fermi could be as fast as 1/2, but consumer cards with GF100 and GF110 (GTX465, GTX470, GTX480, GTX570, GTX580, GTX590) are artificially limited to 1/8 the sp performance.

ATIs Cayman (HD6950, HD6970, HD6990) is at 1/2 sp performance for some fp operations and at 1/4 for others, whereas previous ATIs achieved 2/5 and 1/5. Coupled with the higher sp performance of the ATIs that's the reason why they're so much faster and more efficient at MW.

MrS
Scanning for our furry friends since Jan 2002
ID: 48952 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
europa

Send message
Joined: 29 Oct 10
Posts: 89
Credit: 39,246,947
RAC: 0
Message 48954 - Posted: 24 May 2011, 21:14:16 UTC - in response to Message 48942.  

I had the same problem a couple of months back and posted it here but never got it resolved beyond moving GPUs among my machines so that none had a mixed single precision-douuble precision combo.

Regards,
Steve
ID: 48954 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Joseph Stateson
Avatar

Send message
Joined: 18 Nov 08
Posts: 291
Credit: 2,461,693,501
RAC: 0
Message 48957 - Posted: 25 May 2011, 2:19:32 UTC

I want to correct the computer id I listed above. It should have been
http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=258308

I did get a response over at the boinc forum where Jord suggested running two copies of BOINC. What he suggested will work but it is a PITA to setup.

ID: 48957 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Avatar

Send message
Joined: 1 Sep 08
Posts: 204
Credit: 219,354,537
RAC: 0
Message 48962 - Posted: 25 May 2011, 8:20:18 UTC - in response to Message 48957.  

Sounds like BOINC should get to know the difference between sp and dp capable cards, in addition to the CUDA capability level (and something similar for ATIs).

MrS
Scanning for our furry friends since Jan 2002
ID: 48962 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Joseph Stateson
Avatar

Send message
Joined: 18 Nov 08
Posts: 291
Credit: 2,461,693,501
RAC: 0
Message 53263 - Posted: 18 Feb 2012, 14:08:30 UTC

I still have this problem but now it is a gtx570 and a gts250. The 570 is a full length 3x width and the gts250 a 2/3 length 2x width that just barely fits. I do not see any way to route MW tasks to avoid the 250 and will stop MW for this system.
ID: 53263 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile arkayn
Avatar

Send message
Joined: 14 Feb 09
Posts: 999
Credit: 74,932,619
RAC: 0
Message 53269 - Posted: 18 Feb 2012, 16:29:45 UTC - in response to Message 53263.  

I still have this problem but now it is a gtx570 and a gts250. The 570 is a full length 3x width and the gts250 a 2/3 length 2x width that just barely fits. I do not see any way to route MW tasks to avoid the 250 and will stop MW for this system.


There is one way to do it now, but you would have to install the 7.xx alpha version to do it.

It has an option for the app_info to ignore a gpu for a project.
ID: 53269 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Len LE/GE

Send message
Joined: 8 Feb 08
Posts: 261
Credit: 104,050,322
RAC: 0
Message 53274 - Posted: 18 Feb 2012, 19:35:50 UTC

http://boinc.berkeley.edu/wiki/Client_configuration :

<exclude_gpu>
Don't use the given GPU for the given project. If <device_num> is not specified, exclude all GPUs of the given type. <type> is required if your computer has more than one type of GPU; otherwise it can be omitted. <app> specifies the short name of an application (i.e. the <name> element within the <app> element in client_state.xml). If specified, only tasks for that app are excluded. You may include multiple <exclude_gpu> elements. (New in 6.13 )

<exclude_gpu>
<url>project_URL</url>
[<device_num>N</device_num>]
[<type>nvidia|ati</type>]
[<app>appname</app>]
</exclude_gpu>


and

From the code of mw sep v1.0x there is a command line param
--device [Device number passed by BOINC to use]

No idea if and how that works.
Anyone with 2 gpus willing to test it?
ID: 53274 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 8 May 10
Posts: 576
Credit: 15,979,383
RAC: 0
Message 53311 - Posted: 19 Feb 2012, 18:44:32 UTC - in response to Message 48942.  

The GTX460 definitely does. The fraction just means that is how fast it is compared to single.

The multi GPU support in BOINC can only be described as completely broken, which is why I think you need to create the cc_config.xml with the or whatever it is option to even do so. Your GPUs get folded into whatever BOINC decides is the "most capable" GPU which is completely wrong.

You can either use app_info with the --device argument (looks like it should be --device 1 in your case if you are talking about this system http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=285405) or use the project device exclusion in cc_config.xml.
ID: 53311 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Joseph Stateson
Avatar

Send message
Joined: 18 Nov 08
Posts: 291
Credit: 2,461,693,501
RAC: 0
Message 53529 - Posted: 3 Mar 2012, 14:53:58 UTC - in response to Message 53274.  

http://boinc.berkeley.edu/wiki/Client_configuration :

<exclude_gpu>
Don't use the given GPU for the given project. If <device_num> is not specified, exclude all GPUs of the given type. <type> is required if your computer has more than one type of GPU; otherwise it can be omitted. <app> specifies the short name of an application (i.e. the <name> element within the <app> element in client_state.xml). If specified, only tasks for that app are excluded. You may include multiple <exclude_gpu> elements. (New in 6.13 )

<exclude_gpu>
<url>project_URL</url>
[<device_num>N</device_num>]
[<type>nvidia|ati</type>]
[<app>appname</app>]
</exclude_gpu>


and

From the code of mw sep v1.0x there is a command line param
--device [Device number passed by BOINC to use]

No idea if and how that works.
Anyone with 2 gpus willing to test it?


I put the following together and am running it under 7.0.18


    <cc_config>
    <log_flags>
    </log_flags>
    <options>
    <use_all_gpus>1</use_all_gpus>
    <exclude_gpu>
    <url>http://milkyway.cs.rpi.edu/milkyway/</url>
    <device_num>1</device_num>
    </exclude_gpu>
    <exclude_gpu>
    <url>http://milkyway.cs.rpi.edu/milkyway/</url>
    <device_num>2</device_num>
    </exclude_gpu>
    </options>
    </cc_config>



Results from reading cc_config follow.


    13 2012-03-03 8:28:21 AM NVIDIA GPU 0: GeForce GTX 460 (driver version 285.62, CUDA version 4.10, compute capability 2.1, 1024MB, 933MB available, 907 GFLOPS peak)
    14 2012-03-03 8:28:21 AM NVIDIA GPU 1: GeForce GTS 250 (driver version 285.62, CUDA version 4.10, compute capability 1.1, 1024MB, 970MB available, 705 GFLOPS peak)
    15 2012-03-03 8:28:21 AM NVIDIA GPU 2: GeForce GTS 250 (driver version 285.62, CUDA version 4.10, compute capability 1.1, 1024MB, 937MB available, 705 GFLOPS peak)

    <snip>
    94 2012-03-03 8:39:29 AM Re-reading cc_config.xml
    95 2012-03-03 8:39:29 AM Config: use all coprocessors
    96 Milkyway@Home 2012-03-03 8:39:29 AM Config: excluded GPU. Type: all. App: all. Device: 1
    97 Milkyway@Home 2012-03-03 8:39:29 AM Config: excluded GPU. Type: all. App: all. Device: 2



I also verified it was working: Milkyway was assigned to device 0 only and the other two GPUs were idle during the same time.

ID: 53529 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sunny129
Avatar

Send message
Joined: 25 Jan 11
Posts: 271
Credit: 346,072,284
RAC: 0
Message 53881 - Posted: 1 Apr 2012, 3:40:50 UTC

i'm a bit late to the party, but i can also confirm that the <exclude_gpu> function works. here are my cc_config and my event log:

<cc_config>
<options>
<exclude_gpu>
<url>http://setiathome.berkeley.edu/</url>
<device_num>0</device_num>
</exclude_gpu>
<exclude_gpu>
<url>http://milkyway.cs.rpi.edu/milkyway/</url>
<device_num>1</device_num>
</exclude_gpu>
</options>
</cc_config>


3/31/2012 9:57:56 PM | | ATI GPU 0: AMD Radeon HD 6900 series (Cayman) (CAL version 1.4.1664, 2048MB, 2031MB available, 6144 GFLOPS peak)
3/31/2012 9:57:56 PM | | ATI GPU 1: ATI Radeon HD 5800 series (Cypress) (CAL version 1.4.1664, 2048MB, 2023MB available, 5440 GFLOPS peak)
3/31/2012 9:57:56 PM | | OpenCL: ATI GPU 0: Cayman (driver version CAL 1.4.1664, device version OpenCL 1.1 AMD-APP (851.4), 1024MB, 2031MB available)
3/31/2012 9:57:56 PM | | OpenCL: ATI GPU 1: Cypress (driver version CAL 1.4.1664, device version OpenCL 1.1 AMD-APP (851.4), 1024MB, 2023MB available)
3/31/2012 9:57:56 PM | | ATI GPU 0 is OpenCL-capable
3/31/2012 9:57:56 PM | Milkyway@Home | Found app_info.xml; using anonymous platform
3/31/2012 9:57:56 PM | SETI@home | Found app_info.xml; using anonymous platform
3/31/2012 9:57:56 PM | SETI@home | Config: excluded GPU. Type: all. App: all. Device: 0
3/31/2012 9:57:56 PM | Milkyway@Home | Config: excluded GPU. Type: all. App: all. Device: 1


as you can see, i'm using the <exclude_gpu> function to exclude the HD 5870 from Milkyways@Home so it can focus on SETI@Home and running the display. at the same time, i'm using the function to exclude the HD 6950 from SETI@Home and display duties so it can focus solely on Milkyway@Home. i run 2 MW@H tasks simultaneously on the 6950, and either 2 E@H Astropulse tasks or 2 E@H Multibeam tasks simultaneously on the 5870. sometimes a single Astropulse task will run along side a single Multibeam task.

...i'm not really sure why it doesn't show ATI GPU 1 as OpenCL capable as well, even though it clearly states above that my HD 5870 also makes use of OpenCL. not a big deal though...i've already run some OpenCL-based v1.02 tasks on it successfully. i should note that i was running BOINC v6.12.41 and was pleasantly informed by the manager under the messages tab that it didn't recognize a command in the cc_config file. going back to where i originally referenced it, i found out that only BOINC v6.13.xx and later recognized the <exclude_gpu> command. i actually skipped v6.13.xx and updated straight to v7.0.12 since it recognizes OpenCL-capable devices. my major gripe with it so far is the altered functionality of the project caches. you see, whereas with v6.12.41 i could maintain a decent-sized, but not overwhelming cache. with v7.0.12 it draining the caches of all of my projects until they're nonexistent, and only then does my host bother to do communication w/ the project server to get more work. if i manually update a project while any amount of work is left in its cache, it'll just say "not reporting or requesting work" in the event log. is anyone else experiencing these kinds of shenanigans w/ BOINC v7.0.12? perhaps a different v7.0.xx might solve it? or might i have to go back to v6.13.xx to eliminate this behavior? and if so, which of the v6.13.xx versions are stable while crunching the new Separation v1.02 tasks?

TIA,
Eric
ID: 53881 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile kashi

Send message
Joined: 30 Dec 07
Posts: 311
Credit: 149,490,184
RAC: 0
Message 53883 - Posted: 1 Apr 2012, 12:45:24 UTC - in response to Message 53881.  

With development BOINC 7.0.xx versions, Connect about every x.xx days has now effectively become Minimum work buffer. In fact in the later 7.0.xx versions it has been renamed. If you leave it at 0 days which was previously recommended for an always on connection it will not download any new tasks until your cache is empty. The value needs to be set above 0 if you want to download work before the cache is empty.

This new method of controlling work download may cause tasks to run in high priority mode on projects that have a short deadline. The higher the value you use for Connect about every x.xx days/Minimum work buffer, the more likely it is that tasks may go into high priority mode. It depends on the deadline, for example with projects with a short deadline of 2 days, tasks may run high priority with a value above 0.7-0.8 days. This can cause trouble if they run out of order while leaving other tasks "waiting to run". In later BOINC versions (after 7.0.14, I think) high priority tasks run in "earliest deadline first" order which overcomes this problem on most projects. WCG can still upset the applecart though if a computer becomes a trusted host and gets sent repair tasks with a shorter deadline than normal tasks. These repair tasks may start to run in high priority mode as soon as they are downloaded.
ID: 53883 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sunny129
Avatar

Send message
Joined: 25 Jan 11
Posts: 271
Credit: 346,072,284
RAC: 0
Message 53884 - Posted: 1 Apr 2012, 16:35:29 UTC - in response to Message 53883.  

With development BOINC 7.0.xx versions, Connect about every x.xx days has now effectively become Minimum work buffer. In fact in the later 7.0.xx versions it has been renamed. If you leave it at 0 days which was previously recommended for an always on connection it will not download any new tasks until your cache is empty. The value needs to be set above 0 if you want to download work before the cache is empty.

This new method of controlling work download may cause tasks to run in high priority mode on projects that have a short deadline. The higher the value you use for Connect about every x.xx days/Minimum work buffer, the more likely it is that tasks may go into high priority mode. It depends on the deadline, for example with projects with a short deadline of 2 days, tasks may run high priority with a value above 0.7-0.8 days. This can cause trouble if they run out of order while leaving other tasks "waiting to run". In later BOINC versions (after 7.0.14, I think) high priority tasks run in "earliest deadline first" order which overcomes this problem on most projects. WCG can still upset the applecart though if a computer becomes a trusted host and gets sent repair tasks with a shorter deadline than normal tasks. These repair tasks may start to run in high priority mode as soon as they are downloaded.

thanks for the details. i remember reading something about that in another thread not too long ago, but i couldn't remember exactly what was said or which thread it was in. so again, thanks for making it clear here. i will give BOINC v7.0.15 (or later) a try this evening and treat the "Connect about every x.xx days" parameter as the minimum work buffer instead.
ID: 53884 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sunny129
Avatar

Send message
Joined: 25 Jan 11
Posts: 271
Credit: 346,072,284
RAC: 0
Message 53889 - Posted: 2 Apr 2012, 13:56:56 UTC

well i made the switch to BOINC v7.0.24 last night...mind you, the only reason i did it was to get BOINC to recognize the <exclude_gpu> function in order to force GPU 0 to run Milkyway@Home only, and GPU 1 to run SETI@Home only...not b/c the update was necessary in order to run the new OpenCL-based Separation v1.02 tasks. v7.0.24 has correctly renamed the parameter field names of interest from "connect about every x.xx days" & "additional work buffer x.xx days (max. 10)" to "minimum work buffer x.xx days" & "max. additional work buffer x.xx days." but even with the minimum work buffer and the max additional work buffer set to 5 days & 2 days repectively, i'm still experiencing the same behavior as before - both the Milkyway and SETI task queues are being drained completely before either project server allows more work to be downloaded. i should note that i believe this is being enforced server-side, b/c if i manually update either project before the queues have run down to zero, my host will report whatever tasks have completed, but won't "request any new tasks."

is anyone using BOINC v7.0.15 or later able to build up a queue and maintain it, or are we all experiencing this behavior with our hosts?
ID: 53889 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile arkayn
Avatar

Send message
Joined: 14 Feb 09
Posts: 999
Credit: 74,932,619
RAC: 0
Message 53890 - Posted: 2 Apr 2012, 15:30:32 UTC - in response to Message 53889.  

well i made the switch to BOINC v7.0.24 last night...mind you, the only reason i did it was to get BOINC to recognize the <exclude_gpu> function in order to force GPU 0 to run Milkyway@Home only, and GPU 1 to run SETI@Home only...not b/c the update was necessary in order to run the new OpenCL-based Separation v1.02 tasks. v7.0.24 has correctly renamed the parameter field names of interest from "connect about every x.xx days" & "additional work buffer x.xx days (max. 10)" to "minimum work buffer x.xx days" & "max. additional work buffer x.xx days." but even with the minimum work buffer and the max additional work buffer set to 5 days & 2 days repectively, i'm still experiencing the same behavior as before - both the Milkyway and SETI task queues are being drained completely before either project server allows more work to be downloaded. i should note that i believe this is being enforced server-side, b/c if i manually update either project before the queues have run down to zero, my host will report whatever tasks have completed, but won't "request any new tasks."

is anyone using BOINC v7.0.15 or later able to build up a queue and maintain it, or are we all experiencing this behavior with our hosts?


I have a "full" queue from SETI with 7.0.18, I have my preferences set globally at
Maintain enough tasks to keep busy for at least(max 10 days). 5 days
... and up to an additional 0.5 days 


While my other computer running 6.12.x is at
Maintain enough tasks to keep busy for at least(max 10 days). 0 days 
... and up to an additional 5 days 

ID: 53890 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sunny129
Avatar

Send message
Joined: 25 Jan 11
Posts: 271
Credit: 346,072,284
RAC: 0
Message 53891 - Posted: 2 Apr 2012, 18:53:29 UTC - in response to Message 53890.  
Last modified: 2 Apr 2012, 19:40:35 UTC

I have a "full" queue from SETI with 7.0.18, I have my preferences set globally at
Maintain enough tasks to keep busy for at least(max 10 days). 5 days
... and up to an additional 0.5 days 

well those are similar to my settings (i had the minimum work buffer set to 5 days, and the additional buffer set to 2)...though i had only made those changes host-side via the BOINC manager's settings. i've since implemented those settings server-side via my web preferences as well...although i was always under the impression that the local host settings override the web preference settings, and therefore make it unnecessary to set any of the web preferences that are also made available for edit through the BOINC manager itself.

regardless, i've implemented the settings in both places just to be sure, and unfortunately i'm still seeing the same behavior as before - my SETI and Milkyway caches are draining completely before refilling (as opposed to downloading fewer tasks at a time, more often, and maintaining X number of tasks in the queue at all times).

any other ideas why the work buffers aren't working like they should?

its like i'm stuck in a bad dream...either i use BOINC v7.x.xx to gain access to the <exclude_gpu> function and sacrifice normal scheduling and queue characteristics, or i go back to BOINC v6.12.xx to get normal scheduling and queues and sacrifice the ability to use the <exclude_gpu> function (which i need in order to run both Milkyway@Home and SETI@Home GPU apps at the same time on the same machine). if only BOINC v6.13.xx were a happy medium between BOINC v6.12.xx and v7.x.xx, but its not - v6.13.xx exhibits the same buffer/queue problems for me that v7.x.xx does.

*UPDATE* - just finished a run of Multibeam tasks, at which point a single AP task was downloaded. the same thing happened the last time i got an AP task. so not only can i not maintain a queue of S@H tasks, but i'm getting no more than 1 AP task at a time...which really poses a problem for my host should the server go down...i could be out of work (both AP and BM) for ridiculous amounts of time...
ID: 53891 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile kashi

Send message
Joined: 30 Dec 07
Posts: 311
Credit: 149,490,184
RAC: 0
Message 53895 - Posted: 3 Apr 2012, 4:12:37 UTC

Check the "Remaining (estimated)" time for tasks is correct. I remember when POEM changed to fixed credit tasks, the DCF was tiny so the Remaining time went very high. I had to use Minimum work buffer of 10 days or change the <flops> value in my app_info.xml file in order increase the number of tasks in my cache. The DCF would adjust over a few days but if a resend task with the much smaller estimated computation size was downloaded and processed then the DCF would go way down again.

Newer versions of BOINC report double the GFLOPS value for ATI cards. The older calculation was based on code by Crunch3r and related to double precision. This may affect GPU task work fetch.

You are doing 2 different types of work unit on SETI, perhaps this causes fluctuations in the DCF value. Fluctuating DCF could cause work fetch problems for both GPU projects. Although different cards are assigned to different projects, BOINC work fetch algorithms for the 2 GPUs are possibly combined, so the amount of work in the cache on one project affects when and how many tasks are downloaded for the other project.

You could try suspending one GPU project and see if the other project then downloads work. While you're at it you could suspend any CPU projects too. It shouldn't make a difference but development BOINC versions may use the ncpus value in strange ways.

You could also try setting your CPU projects to No new tasks and then increasing Minimum work buffer to the maximum 10 days to see if that forces GPU work download before cache is dry.

You've been checking Event Log I suppose, to ensure you haven't had error tasks causing your daily quota to be reduced.

Maybe you could get greater detail in your Event Log by using a cc_config.xml with the extra logging commands related to work fetch.

ID: 53895 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3328
Credit: 522,544,748
RAC: 80,420
Message 53897 - Posted: 3 Apr 2012, 10:09:58 UTC - in response to Message 53891.  

I have a "full" queue from SETI with 7.0.18, I have my preferences set globally at
Maintain enough tasks to keep busy for at least(max 10 days). 5 days
... and up to an additional 0.5 days 

well those are similar to my settings (i had the minimum work buffer set to 5 days, and the additional buffer set to 2)...though i had only made those changes host-side via the BOINC manager's settings. i've since implemented those settings server-side via my web preferences as well...although i was always under the impression that the local host settings override the web preference settings, and therefore make it unnecessary to set any of the web preferences that are also made available for edit through the BOINC manager itself.

regardless, i've implemented the settings in both places just to be sure, and unfortunately i'm still seeing the same behavior as before - my SETI and Milkyway caches are draining completely before refilling (as opposed to downloading fewer tasks at a time, more often, and maintaining X number of tasks in the queue at all times).

any other ideas why the work buffers aren't working like they should?

its like i'm stuck in a bad dream...either i use BOINC v7.x.xx to gain access to the <exclude_gpu> function and sacrifice normal scheduling and queue characteristics, or i go back to BOINC v6.12.xx to get normal scheduling and queues and sacrifice the ability to use the <exclude_gpu> function (which i need in order to run both Milkyway@Home and SETI@Home GPU apps at the same time on the same machine). if only BOINC v6.13.xx were a happy medium between BOINC v6.12.xx and v7.x.xx, but its not - v6.13.xx exhibits the same buffer/queue problems for me that v7.x.xx does.

*UPDATE* - just finished a run of Multibeam tasks, at which point a single AP task was downloaded. the same thing happened the last time i got an AP task. so not only can i not maintain a queue of S@H tasks, but i'm getting no more than 1 AP task at a time...which really poses a problem for my host should the server go down...i could be out of work (both AP and BM) for ridiculous amounts of time...


The new Boinc version 7 is NOT clear on the words they are using in the cache area, the early versions anyway. If you are coming directly from version 6 to version 7 you should reverse the numbers you had in version 6. The newer version 7 releases are clearer about what each set of numbers does. But basically the 1st set of numbers now says "minimum work buffer", while the 2nd set says "max additional work buffer", this is in version 7.0.23. Essentially in version 6 you reverse those two with the smaller being in the 1st section and the bigger being in the 2nd part. BUT in version 7 that says to maintain only a VERY small minimum work buffer, which is not what most of us want. I just reversed the numbers and my cache has not changed from the one version to the next.
ID: 53897 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sunny129
Avatar

Send message
Joined: 25 Jan 11
Posts: 271
Credit: 346,072,284
RAC: 0
Message 53902 - Posted: 3 Apr 2012, 17:55:28 UTC - in response to Message 53897.  
Last modified: 3 Apr 2012, 17:57:55 UTC

The new Boinc version 7 is NOT clear on the words they are using in the cache area, the early versions anyway. If you are coming directly from version 6 to version 7 you should reverse the numbers you had in version 6. The newer version 7 releases are clearer about what each set of numbers does. But basically the 1st set of numbers now says "minimum work buffer", while the 2nd set says "max additional work buffer", this is in version 7.0.23. Essentially in version 6 you reverse those two with the smaller being in the 1st section and the bigger being in the 2nd part. BUT in version 7 that says to maintain only a VERY small minimum work buffer, which is not what most of us want. I just reversed the numbers and my cache has not changed from the one version to the next.

well before i was running BOINC v6.12.41, where the "connect about every x.xx days" field was set to 0 and the "additional work buffer" field was set to 5 days. so now that i'm running BOINC v7.0.24, i have the "minimum work buffer" field set to 5 and the "additional work buffer" set to 0...in other words, the numbers are now reversed as you suggested. i made this change a good 6 hours ago, and have noticed no positive changes. my host has contacted the server several times since then to report finished tasks, but has not requested any new work, so nothing has changed...then again, the small queue of SETI tasks i had since this morning is only now winding down to zero - the last SETI WU is being crunched right now. perhaps the cache behavior will change after the next request for work (which won't happen until this last task finishes and reports). *EDIT* - the last SETI WU just completed, uploaded, and reported, but i did not receive a new cache of WU's b/c the server is down for maintenance...so i guess i'll have to wait a while before i see any new tasks...

i may have to further experiment with Kashi's suggestions and suspend one of my two GPU projects (and possibly even my CPU projects) to see how it affects the cache behavior of the one project left running.
ID: 53902 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3328
Credit: 522,544,748
RAC: 80,420
Message 53908 - Posted: 4 Apr 2012, 10:51:25 UTC - in response to Message 53902.  

The new Boinc version 7 is NOT clear on the words they are using in the cache area, the early versions anyway. If you are coming directly from version 6 to version 7 you should reverse the numbers you had in version 6. The newer version 7 releases are clearer about what each set of numbers does. But basically the 1st set of numbers now says "minimum work buffer", while the 2nd set says "max additional work buffer", this is in version 7.0.23. Essentially in version 6 you reverse those two with the smaller being in the 1st section and the bigger being in the 2nd part. BUT in version 7 that says to maintain only a VERY small minimum work buffer, which is not what most of us want. I just reversed the numbers and my cache has not changed from the one version to the next.

well before i was running BOINC v6.12.41, where the "connect about every x.xx days" field was set to 0 and the "additional work buffer" field was set to 5 days. so now that i'm running BOINC v7.0.24, i have the "minimum work buffer" field set to 5 and the "additional work buffer" set to 0...in other words, the numbers are now reversed as you suggested. i made this change a good 6 hours ago, and have noticed no positive changes. my host has contacted the server several times since then to report finished tasks, but has not requested any new work, so nothing has changed...then again, the small queue of SETI tasks i had since this morning is only now winding down to zero - the last SETI WU is being crunched right now. perhaps the cache behavior will change after the next request for work (which won't happen until this last task finishes and reports). *EDIT* - the last SETI WU just completed, uploaded, and reported, but i did not receive a new cache of WU's b/c the server is down for maintenance...so i guess i'll have to wait a while before i see any new tasks...

i may have to further experiment with Kashi's suggestions and suspend one of my two GPU projects (and possibly even my CPU projects) to see how it affects the cache behavior of the one project left running.


I hate that when a project goes down while you are testing!!!
ID: 53908 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : tasks being sent to wrong gpu card

©2024 Astroinformatics Group