Welcome to MilkyWay@home

getting errors with new v1.02 separation application?

Message boards : Number crunching : getting errors with new v1.02 separation application?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Profile kashi

Send message
Joined: 30 Dec 07
Posts: 311
Credit: 149,490,184
RAC: 0
Message 53359 - Posted: 21 Feb 2012, 5:48:48 UTC

Yes that log appears to show that the BOINC OpenCL detection is broken. It is detecting the HD 4290 as a Cypress class GPU that is OpenCL capable when it is not and not detecting the OpenCL capability of the HD 5870. With incorrect OpenCL detection by BOINC as a starting point any configuration settings and .xml files to try to get an OpenCL application to work are ineffective.

What a horror, I can't think of any other solution other than disabling the onboard video as you have already thought of yourself.

As for the target parameters I use "--gpu-target-frequency 100" with my 5870 to process 2 concurrent tasks. Any lower than 100 and the lag with 2 concurrent tasks is unbearable on my system.
ID: 53359 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile arkayn
Avatar

Send message
Joined: 14 Feb 09
Posts: 999
Credit: 74,932,619
RAC: 0
Message 53360 - Posted: 21 Feb 2012, 6:34:06 UTC - in response to Message 53359.  

Yes that log appears to show that the BOINC OpenCL detection is broken. It is detecting the HD 4290 as a Cypress class GPU that is OpenCL capable when it is not and not detecting the OpenCL capability of the HD 5870. With incorrect OpenCL detection by BOINC as a starting point any configuration settings and .xml files to try to get an OpenCL application to work are ineffective.

What a horror, I can't think of any other solution other than disabling the onboard video as you have already thought of yourself.

As for the target parameters I use "--gpu-target-frequency 100" with my 5870 to process 2 concurrent tasks. Any lower than 100 and the lag with 2 concurrent tasks is unbearable on my system.


Not quite actually, the problem is that is only detects the OpenCL of the 5870 but as it has already disabled ATI device 0 (the 4290) it thinks it has to disable the OpenCL ATI device 0 the (the 5870)

2/20/2012 11:42:10 PM | | ATI GPU 0: (not used) Cypress (CAL version 1.4.1664, 341MB, 324MB available, 107 GFLOPS peak)
2/20/2012 11:42:10 PM | | ATI GPU 1: ATI Radeon HD 5800 series (Cypress) (CAL version 1.4.1664, 2048MB, 2031MB available, 5440 GFLOPS peak)
2/20/2012 11:42:10 PM | | OpenCL: ATI GPU 0 (not used): Cypress (driver version CAL 1.4.1664, device version OpenCL 1.1 AMD-APP (851.4), 1024MB, 324MB available)


This is one of those weird situations.
ID: 53360 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile kashi

Send message
Joined: 30 Dec 07
Posts: 311
Credit: 149,490,184
RAC: 0
Message 53361 - Posted: 21 Feb 2012, 6:51:01 UTC

Yes that's probably it. The "324MB available" made me think the OpenCL line was referring to the HD 4290. Either way she's a mixed up, shook up girl.
ID: 53361 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sunny129
Avatar

Send message
Joined: 25 Jan 11
Posts: 271
Credit: 346,072,284
RAC: 0
Message 53369 - Posted: 21 Feb 2012, 18:51:30 UTC

you know, approx 1 year ago i had a heck of a time getting the MW@H and SETI@Home projects working on the same machine (using my HD 5870). i was finally able to do it after hours (maybe even days) of experimentation, but the hoops i had to jump through to get it all working were tedious to say the least. basically it came down to using an GPU ignore argument in the cc_config. when i wanted to crunch MW@H, i would set it to ignore GPU 0 (the integrated HD 4290), which seems logical enough considering it would try to crunch on the HD 4290 and throw errors if i did not ignore it. but getting SETI@Home to work was completely counter-intuitive - using the GPU ignore argument in the cc_config.xml again, i had to change it to ignore GPU 1 (the HD 5870), and that would ironically allow SETI@Home to run properly on the HD 5870. if i tried to crunch SETI@Home while ignoring GPU 0 (the integrated HD 4290), SETI@Home would actually try to run on the HD 4290 and throw errors. long story short, these issues were all caused by the fact that i was running 2 GPUs (the HD 4290, which was the dedicated display GPU, and the HD 5870, which was the dedicated cruncher), and their respective characteristics regarding CAL, OpenCL, etc.

i have this feeling that, if its at all possible to get the new separation v1.02 tasks working without completely disabling the IGP vie the BOIS, its going to require some wacky counter-intuitive cmdline argument/cc_config option (or a combination thereof), and that it'll take quite some time to find since it'll probably be counter-intuitive...if its possible at all.

i'm not looking forward to disabling the IGP. i'm just very skeptical of people's claims about minimal or nonexistent GUI lag when crunching and running the display on the same GPU. i have this feeling that GUI lag is going to be quite subjective just like computer noise is...after all, the amount of noise one person finds perfectly tolerable may be far too loud for someone else's liking.
ID: 53369 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile kashi

Send message
Joined: 30 Dec 07
Posts: 311
Credit: 149,490,184
RAC: 0
Message 53372 - Posted: 22 Feb 2012, 0:59:48 UTC

Yes that's true, it would depend on what you are used to and how you use the computer. I remember when I was using a HD 5970 and and a HD 5870, things were slowed a little. Mouse clicks and opening files were not noticeably affected but the time it took for webpages to display had increased a bit. I had thought it was my internet connection until one day I when I had stopped GPU processing due to both MilkyWay and Collatz being down at the same time. Webpages displayed almost instantly and I thought the phone line must have been upgraded. When MilkyWay came back and I started up GPU crunching again I noticed that the slight delay on webpage display had returned and the penny dropped.

I didn't say lag was non-existent with a target frequency of 100, I said it was unbearable on this computer with it lower than 100. The lag I experienced with more recent MilkyWay versions running 2 concurrent tasks when I used the default or a value less than 100 for --gpu target frequency was in another league altogether to the "normal" lag of a GPU used for the screen also being used for processing. Mouse clicks were non-responsive for a long time and the screen updated very slowly or froze completely. Basically the computer became unusable.

I'm currently running POEM OpenCL application on a single 5870 and it is a lot less GPU intensive than many other GPU projects. Even running 4 tasks concurrently and using 3 CPU cores to support GPU processing, GPU load is still only 90% and current draw is relatively low. Webpages display noticeably faster than when I'm running more intensive GPU projects at 99-100% GPU load.
ID: 53372 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sunny129
Avatar

Send message
Joined: 25 Jan 11
Posts: 271
Credit: 346,072,284
RAC: 0
Message 53373 - Posted: 22 Feb 2012, 1:35:10 UTC
Last modified: 22 Feb 2012, 1:35:42 UTC

i wasn't referencing you when i mentioned claims of nonexistent GUI lag on systems whose GPUs share both display and crunching duties - i've heard it before from others on the forums...and from members i trust no less. its not that i don't trust or believe their claims - its like i said, i think GUI lag will be subjective...of course there's no way for me to know if i don't try it myself. hopefully i'll get a chance to disable the IGP in the BIOS before the weekend and see if i can't get separation v1.02 tasks running error-free.
ID: 53373 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>EDLS] Polynesia
Avatar

Send message
Joined: 5 Apr 09
Posts: 71
Credit: 6,120,786
RAC: 0
Message 53374 - Posted: 22 Feb 2012, 12:08:47 UTC

Hello,

Hello,

Do not leave home all the time error, but whether with the latest version 6.12.34 or there, I do have units that go on errors ...

I have the latest Nvidia driver ...

Here is an error messages:

<core_client_version>6.12.34</core_client_version>
<![CDATA[
<message>
Fonction incorrecte. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
<search_application> milkyway_separation 1.02 Windows x86_64 double OpenCL </search_application>
Unrecognized XML in project preferences: max_gfx_cpu_pct
Skipping: 2
Skipping: /max_gfx_cpu_pct
Unrecognized XML in project preferences: allow_non_preferred_apps
Skipping: 1
Skipping: /allow_non_preferred_apps
Guessing preferred OpenCL vendor 'NVIDIA Corporation'
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Error reading astronomy parameters from file 'astronomy_parameters.txt'
Trying old parameters file
Using SSE4.1 path
Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR
Failed to get information about device
Error getting device and context (1): MW_CL_ERROR
Failed to calculate likelihood
<background_integral> 1.#QNAN0000000000 </background_integral>
<stream_integral> 1.#QNAN0000000000 1.#QNAN0000000000 </stream_integral>
<background_likelihood> 1.#QNAN0000000000 </background_likelihood>
<stream_only_likelihood> 1.#QNAN0000000000 1.#QNAN0000000000 </stream_only_likelihood>
<search_likelihood> 1.#QNAN0000000000 </search_likelihood>
12:54:51 (4224): called boinc_finish

</stderr_txt>
]]>

thank you for your support ...
Team Alliance francophone, boinc: 7.0.18

GA-P55-UD5, i7 860, Win 7 64 bits, 8g DDR3, GTX 470
ID: 53374 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 8 May 10
Posts: 576
Credit: 15,979,383
RAC: 0
Message 53379 - Posted: 22 Feb 2012, 22:10:00 UTC - in response to Message 53374.  

Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR
Something is wrong with your driver installation, but it looks like you've fixed it now since you have tasks succeeding.
ID: 53379 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>EDLS] Polynesia
Avatar

Send message
Joined: 5 Apr 09
Posts: 71
Credit: 6,120,786
RAC: 0
Message 53386 - Posted: 23 Feb 2012, 6:31:55 UTC

not exactly ... Because by the time it works well and suddenly some go astray ....
Team Alliance francophone, boinc: 7.0.18

GA-P55-UD5, i7 860, Win 7 64 bits, 8g DDR3, GTX 470
ID: 53386 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dave Kelly

Send message
Joined: 21 Sep 09
Posts: 2
Credit: 142,153,832
RAC: 1,941
Message 53389 - Posted: 23 Feb 2012, 8:26:14 UTC - in response to Message 53386.  

I started having errors in WUs done on GPU on GTX 560Ti card computer 114912.
All units done on GPU failed after less than 3 seconds.
These started roughly coincident with upgrading to Nvidia's latest driver 295.73
Work units were processed correctly while I was using the machine so I set the display to never go off (in Control Panel - Power Options).
No units have errored out since (about 18 hours).

Dave Kelly

stderr output of one (88846109) shown below.

<core_client_version>6.12.34</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
<search_application> milkyway_separation 1.02 Windows x86_64 double OpenCL </search_application>
Unrecognized XML in project preferences: nvidia_block_amount
Skipping: 128
Skipping: /nvidia_block_amount
Guessing preferred OpenCL vendor 'NVIDIA Corporation'
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Error reading astronomy parameters from file 'astronomy_parameters.txt'
Trying old parameters file
Using AVX path
Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR
Failed to get information about device
Error getting device and context (1): MW_CL_ERROR
Failed to calculate likelihood
<background_integral> 1.#QNAN0000000000 </background_integral>
<stream_integral> 1.#QNAN0000000000 1.#QNAN0000000000 </stream_integral>
<background_likelihood> 1.#QNAN0000000000 </background_likelihood>
<stream_only_likelihood> 1.#QNAN0000000000 1.#QNAN0000000000 </stream_only_likelihood>
<search_likelihood> 1.#QNAN0000000000 </search_likelihood>
11:00:24 (7020): called boinc_finish

</stderr_txt>
]]>

ID: 53389 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile arkayn
Avatar

Send message
Joined: 14 Feb 09
Posts: 999
Credit: 74,932,619
RAC: 0
Message 53393 - Posted: 23 Feb 2012, 16:37:43 UTC - in response to Message 53389.  

I started having errors in WUs done on GPU on GTX 560Ti card computer 114912.
All units done on GPU failed after less than 3 seconds.
These started roughly coincident with upgrading to Nvidia's latest driver 295.73
Work units were processed correctly while I was using the machine so I set the display to never go off (in Control Panel - Power Options).
No units have errored out since (about 18 hours).

Dave Kelly


Over at Lunatics, we noticed a bug in the 295.xx drivers. When the DVI connected display went to sleep, the CUDA card disappeared. It only seems to affect the DVI connection as users who use either VGA or HDMI have not seen this problem.

We went back to the 290.53 drivers as a response.
ID: 53393 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>EDLS] Polynesia
Avatar

Send message
Joined: 5 Apr 09
Posts: 71
Credit: 6,120,786
RAC: 0
Message 53399 - Posted: 23 Feb 2012, 23:04:28 UTC - in response to Message 53393.  

I started having errors in WUs done on GPU on GTX 560Ti card computer 114912.
All units done on GPU failed after less than 3 seconds.
These started roughly coincident with upgrading to Nvidia's latest driver 295.73
Work units were processed correctly while I was using the machine so I set the display to never go off (in Control Panel - Power Options).
No units have errored out since (about 18 hours).

Dave Kelly


Over at Lunatics, we noticed a bug in the 295.xx drivers. When the DVI connected display went to sleep, the CUDA card disappeared. It only seems to affect the DVI connection as users who use either VGA or HDMI have not seen this problem.

We went back to the 290.53 drivers as a response.


It is also for me?

Team Alliance francophone, boinc: 7.0.18

GA-P55-UD5, i7 860, Win 7 64 bits, 8g DDR3, GTX 470
ID: 53399 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile arkayn
Avatar

Send message
Joined: 14 Feb 09
Posts: 999
Credit: 74,932,619
RAC: 0
Message 53401 - Posted: 24 Feb 2012, 1:24:54 UTC - in response to Message 53399.  

I started having errors in WUs done on GPU on GTX 560Ti card computer 114912.
All units done on GPU failed after less than 3 seconds.
These started roughly coincident with upgrading to Nvidia's latest driver 295.73
Work units were processed correctly while I was using the machine so I set the display to never go off (in Control Panel - Power Options).
No units have errored out since (about 18 hours).

Dave Kelly


Over at Lunatics, we noticed a bug in the 295.xx drivers. When the DVI connected display went to sleep, the CUDA card disappeared. It only seems to affect the DVI connection as users who use either VGA or HDMI have not seen this problem.

We went back to the 290.53 drivers as a response.


It is also for me?

Looks like it. I would suggest going back to the 290.53 drivers.
ID: 53401 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sunny129
Avatar

Send message
Joined: 25 Jan 11
Posts: 271
Credit: 346,072,284
RAC: 0
Message 53402 - Posted: 24 Feb 2012, 4:46:44 UTC

ok guys, so i went ahead and disabled the IGP in the BIOS and made the discrete HD 5870 GPU the one and only GPU in the box, and sure enough my host started downloading and crunching v1.02 tasks without error...save for a few, which occurred b/c i had some incorrect cmdline parameter syntax in the app_info.xml file...its since been fixed. mind you i'm running BOINC v6.12.34 (not v7.x.xx), but the Catalyst drivers have been updated to 12.1. the GUI lag was pretty horrendous with a single task running, and naturally even worse with 2 simultaneous tasks running. so i used the --gpu-target-frequency cmdline parameter (incorrectly at first, which caused a few errors) to try to improve GUI lag. i brought the value all the way up to 200 (up from the default 60), and GUI lag got considerably better, but it is a LONG way from being gone completely...and unfortunately it has also caused run times to increase noticeably, from ~2:05 to ~2:15.

if i can't find a target frequency that satisfies my need for efficient crunching AND minimal/no GUI lag, then i may revert back to Separation v0.82 again. Matt, i recall you mentioning in the "Separation updated to 1.00" thread that CAL will no longer be supported come the release of SDK v2.7 sometime in March. so if i decide to temporarily revert back, i'll do so knowing that i must somehow prepare for the final deprecation of Separation v0.82, whether that means sticking with my current setup and dealing with more GUI lag than i'd like, or biting the bullet and adding another discrete GPU (or 2) and a new motherboard to avoid GUI lag altogether.
ID: 53402 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dave Kelly

Send message
Joined: 21 Sep 09
Posts: 2
Credit: 142,153,832
RAC: 1,941
Message 53404 - Posted: 24 Feb 2012, 9:03:22 UTC - in response to Message 53393.  

Arkayn
Thanks. I think I will revert back to an earlier driver

Dave Kelly.
ID: 53404 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>EDLS] Polynesia
Avatar

Send message
Joined: 5 Apr 09
Posts: 71
Credit: 6,120,786
RAC: 0
Message 53409 - Posted: 24 Feb 2012, 15:19:49 UTC

I am going to try to put out more at the end of 10 minutes automatically and to put in it the screen of wakefulness of boinc....

To put the screen of virgin wakefulness, t-il abyss my screen hannspree HF257?
Team Alliance francophone, boinc: 7.0.18

GA-P55-UD5, i7 860, Win 7 64 bits, 8g DDR3, GTX 470
ID: 53409 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sunny129
Avatar

Send message
Joined: 25 Jan 11
Posts: 271
Credit: 346,072,284
RAC: 0
Message 53410 - Posted: 24 Feb 2012, 17:07:37 UTC
Last modified: 24 Feb 2012, 17:14:06 UTC

well apparently the new v1.02 tasks had some "settling in" to do overnight. if you'll recall in my last post, my run times had increased from ~2:05 to ~2:15 in the first hour or so of running the v1.02 application. then i went to bed. when i woke up this morning, the average run time for v1.02 tasks was up to ~2:40. so it appears that while changing the --gpu-target-frequency immediately affects GUI lag, the WU run times take quite a few hours to settle in around a new average value...so i guess you won't know how a specific change in the --gpu-target-frequency will affect WU run times until some time after the change is made. either way, run times have increased by ~30%, which means my production is dropping from ~220,000 PPD down to ~170,000. i find this unacceptable, and will continue to search for a workaround.



Yes that log appears to show that the BOINC OpenCL detection is broken. It is detecting the HD 4290 as a Cypress class GPU that is OpenCL capable when it is not and not detecting the OpenCL capability of the HD 5870. With incorrect OpenCL detection by BOINC as a starting point any configuration settings and .xml files to try to get an OpenCL application to work are ineffective.

What a horror, I can't think of any other solution other than disabling the onboard video as you have already thought of yourself.

As for the target parameters I use "--gpu-target-frequency 100" with my 5870 to process 2 concurrent tasks. Any lower than 100 and the lag with 2 concurrent tasks is unbearable on my system.


Not quite actually, the problem is that is only detects the OpenCL of the 5870 but as it has already disabled ATI device 0 (the 4290) it thinks it has to disable the OpenCL ATI device 0 the (the 5870)

2/20/2012 11:42:10 PM | | ATI GPU 0: (not used) Cypress (CAL version 1.4.1664, 341MB, 324MB available, 107 GFLOPS peak)
2/20/2012 11:42:10 PM | | ATI GPU 1: ATI Radeon HD 5800 series (Cypress) (CAL version 1.4.1664, 2048MB, 2031MB available, 5440 GFLOPS peak)
2/20/2012 11:42:10 PM | | OpenCL: ATI GPU 0 (not used): Cypress (driver version CAL 1.4.1664, device version OpenCL 1.1 AMD-APP (851.4), 1024MB, 324MB available)


This is one of those weird situations.


Arkayn, if you're fairly confident in your assessment, then it would appear that the solution to getting v1.02 running error-free on the HD 5870 and still using the HD 4290 IGP to run the display (if such a solution is even possible to implement) would be to get BOINC to recognize the HD 5870 as GPU 0 and the HD 4290 as GPU 1. the problem that is immediately apparent with this hypothetical solution is that i believe BOINC automatically recognizes the display GPU/primary GPU as GPU 0. if this is the case, then i have to find a way to enable the IGP in the BIOS, yet trick BOINC (and possibly even Windows) into thinking that it isn't the primary GPU. i would imagine this would also entail tricking Windows into thinking that the HD 5870 GPU the primary GPU, but not the actual display GPU...and i don't yet know if that's possible. i have an extra dummy plug laying around, so i'm going to play around with the hardware later (if i get a chance) and see if i can't trick BOINC/Windows into doing exactly what i want...

i'll post up with results later...
ID: 53410 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sunny129
Avatar

Send message
Joined: 25 Jan 11
Posts: 271
Credit: 346,072,284
RAC: 0
Message 53415 - Posted: 25 Feb 2012, 4:24:47 UTC - in response to Message 53410.  

Arkayn, if you're fairly confident in your assessment, then it would appear that the solution to getting v1.02 running error-free on the HD 5870 and still using the HD 4290 IGP to run the display (if such a solution is even possible to implement) would be to get BOINC to recognize the HD 5870 as GPU 0 and the HD 4290 as GPU 1. the problem that is immediately apparent with this hypothetical solution is that i believe BOINC automatically recognizes the display GPU/primary GPU as GPU 0. if this is the case, then i have to find a way to enable the IGP in the BIOS, yet trick BOINC (and possibly even Windows) into thinking that it isn't the primary GPU. i would imagine this would also entail tricking Windows into thinking that the HD 5870 GPU the primary GPU, but not the actual display GPU...and i don't yet know if that's possible. i have an extra dummy plug laying around, so i'm going to play around with the hardware later (if i get a chance) and see if i can't trick BOINC/Windows into doing exactly what i want...

i'll post up with results later...

well i had no luck with the above...there is no way to have the IGP enabled in Windows if it isn't first set as the initial display device in the BIOS. in other words, the only way to use it is to make it the primary/display GPU - it can't be made available in Windows as a secondary GPU. without a way around this limitation, BOINC will always recognize the IGP as the primary GPU (GPU 0) and any subsequent discrete GPU as a secondary GPU (GPU 1, GPU 2, etc). so what is the real solution to running the new OpenCL-based v1.02 MW@H app on a machine whose primary GPU is an OpenCL-incapable IGP dedicated to the display, and whose secondary GPU is the actual OpenCL-capable device? BOINC developers and MW@H developers need to coordinate on things like GPU recognition and OpenCL device enumeration. i was hoping that BOINC v7.x.xx would be the answer since its supposed to address a number of GPU computing issues that BOINC v6.x.xx didn't. i understand that the developers focus more on issues regarding multi-discrete GPU crunching machines (as opposed to my somewhat rare situation involving a discrete crunching GPU and an IGP whose lack of capabilities has an adverse affect on BOINC despite it not even being used to crunch at all). but after experiencing having to crunch and run the display on the same GPU (in order to get Separation v1.02 tasks to run without error), and doing everything i could with the cmdline parameters to minimize the GUI lag it incurred, i still maintain that the only way to truly eliminate (not just minimize) GUI lag is to dedicate a specific GPU to running the display full-time, whether its discrete or integrated, and allowing the other GPU(s) to crunch full-time.

i suppose i'm pickier than most, but the slightest amount of GUI lag really bothers me. so i've decided to revert back to Separation v0.82 until its not longer supported or until i can afford to upgrade my hardware to a system that can handle OpenCL tasks without device enumeration issues (like a mobo w/ 3 or 4 PCIe x16 slots and multiple OpenCL-capable discrete GPUs). hopefully that latter comes before the former so that i can just switch out the hardware and switch over to v1.02 seamlessly...of course hardly anything happens seamlessly in distributed computing haha.
ID: 53415 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Axiom
Avatar

Send message
Joined: 28 Sep 11
Posts: 2
Credit: 9,893,636
RAC: 0
Message 54195 - Posted: 27 Apr 2012, 12:53:05 UTC

Hi there

I have a similar issue.
I can confirm Kashi's post works for me..

<cc_config>
<options>
<exclude_gpu>
<url>http://milkyway.cs.rpi.edu/milkyway/</url>
<device_num>INSERT DEVICE # HERE</device_num>
</exclude_gpu>
</options>
</cc_config>


In v6 boinc i had cc_config enable all gpus without further mods, as it didn't "like" my 5450 in conjunction with my 5850.... it would use my 5850.

But Boinc or something seemed to know that my second hd5450 was not able to do the double precision apps for milkyway, whereas my HD5850 did them just fine, so the 5850 would crunch away and the 5450 would not crunch the milkyway@home stuff, it was CPU only.

but on v7 now, things get weird, and my "enable all gpus" started to cause an issue.

So to make a long story longer...

On my main machine with the two cards, the 5850 crunches fine and in the time it takes to complete one unit, the second 5450 has "computation error" failed the rest - I normally have a batch of ten or so that download. They try for a few seconds, fail, rinse and repeat.

If I manage to get the 5450 to work on another project, milkyway@home v1.02
will just go to the 5850, but I have to watch it like a hawk.

04/27/12 8:44:13 AM | | ATI GPU 0: Cypress (CAL version 1.4.1703, 1024MB, 991MB available, 4406 GFLOPS peak)
04/27/12 8:44:13 AM | | ATI GPU 1: Cedar (CAL version 1.4.1703, 1024MB, 991MB available, 240 GFLOPS peak)
04/27/12 8:44:13 AM | | OpenCL: ATI GPU 0: Cypress (driver version CAL 1.4.1703 (VM), device version OpenCL 1.2 AMD-APP (923.1), 1024MB, 991MB available)
04/27/12 8:44:13 AM | | OpenCL: ATI GPU 1: Cedar (driver version CAL 1.4.1703 (VM), device version OpenCL 1.2 AMD-APP (923.1), 1024MB, 991MB available)

Were it gets bizarre is that I have a second machine with ONLY an HD5450 in it, matching ati drivers, matching hardware specs, matching boinc projects and matching log, blah blah it's the same - so the only change is simply the fact it has no 5850 card.

04/27/12 8:44:13 AM | | ATI GPU 0: Cedar (CAL version 1.4.1703, 1024MB, 991MB available, 240 GFLOPS peak)
04/27/12 8:44:13 AM | | OpenCL: ATI GPU 0: Cedar (driver version CAL 1.4.1703 (VM), device version OpenCL 1.2 AMD-APP (923.1), 1024MB, 991MB available)

It does NOT download or process any milkyway@home v1.02, and crunches all other ATI enabled apps. So out of the box it woks as expected.

Now, thanks to Kashi, and other thread posters, the cc_config options to block the 5450 works fine and now both machines are crunching away error free.

Here's what I've noticed though.
On my second machine with only the 5450, Boinc Messages informs me that my GPU lacks double precision math - as expected, so I assume it's telling milkyway@home to use CPU only.
This does not occur on my dual card machine. I never get this message.
My 5850 DOES support double precision, but the 5450 does not - and both cards DO support opencl (contrary to other posts)

So it appears that when Boinc does the hardware query, it's only taking the first card it reads for it's project configurations. I'm tempted to swap cards, but I don't have time for that shenanigans.

If it's app specific, then perhaps the milkyway@home guys need to add 20 lines of code to check the capability of the additional gpu's so that people who are total n00bs like me don't have to muck about in cc_config.

thank you and good night.
ID: 54195 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 8 May 10
Posts: 576
Credit: 15,979,383
RAC: 0
Message 54271 - Posted: 1 May 2012, 16:17:16 UTC - in response to Message 54195.  

So it appears that when Boinc does the hardware query, it's only taking the first card it reads for it's project configurations. I'm tempted to swap cards, but I don't have time for that shenanigans.

If it's app specific, then perhaps the milkyway@home guys need to add 20 lines of code to check the capability of the additional gpu's so that people who are total n00bs like me don't have to muck about in cc_config.

thank you and good night.
BOINC's multiGPU support is completely nonexistent. It only uses the GPU it decides the "most capable." This is why you need to enable all GPUs in the first place with cc_config.xml
ID: 54271 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : getting errors with new v1.02 separation application?

©2024 Astroinformatics Group