Message boards :
Number crunching :
getting errors with new v1.02 separation application?
Message board moderation
Author | Message |
---|---|
Send message Joined: 25 Jan 11 Posts: 271 Credit: 346,072,284 RAC: 0 |
like the title says, v1.02 tasks are all erroring out pretty much as soon as they start. the longest only ran for ~3 seconds. i followed the Separation updated to 1.00 thread very closely to see which ATI driver might work best for my combination of hardware and software, but have had no success thus far. i first started by looking for folks w/ 58xx series GPUs, most of which were running on a Win7 x64 platform...so i had to narrow those search results down to WinXP 32-bit platforms, of which i only found one belonging to RAMen. i had a look at some of his validated results to see what a successful Stderr output should look like under the new v1.02 application - a typical successful result looks like this: Stderr output here is what one of my v1.02 errors looks like: Stderr output the bolded part marks the point at which my output starts to differentiate itself from a successful result. of all the platforms crunching w/ a 58xx series GPU that i looked at (regardless of OS), they seem to be "using device 0 on platform 0," whereas mine seems to be "using device 1 on platform 0." this makes sense to me b/c my device 0 is actually the motherboard's HD4290 IGP, while device 1 is my HD 5870 GPU...however i think this device labeling system is causing a GPU recognition problem. the successful platforms are finding the "Cypress" GPU after detecting 1 CL device, whereas mine finds 1 CL device, and then claims that the "requested device is out of range of number found devices." i've gotten this same error and same Stderr output w/ Catalyst drivers 11.8, 11.10, and 12.1. if there is at least one other participant who is able to crunch v1.02 tasks successfully w/ his 58xx series GPU on WinXP x32, then i don't see why i should be able to...i just have to get to the bottom of this error... does anyone have a clue what might be going on here and what i might have to do in order to fix it? i've reverted back to v0.82 for the time being, but i know its only a matter of time before it gets deprecated for good. at that point, i'll no longer be able to crunch for MW@H unless i can get the v1.02 tasks running error-free. |
Send message Joined: 14 Feb 09 Posts: 999 Credit: 74,932,619 RAC: 0 |
Are you still using the on-board graphics to crunch at another project? If not, you might try disabling it in the cc_config.xml file and see if that might help. |
Send message Joined: 8 May 10 Posts: 576 Credit: 15,979,383 RAC: 0 |
The reason is how BOINC handles device indexing. If you look the first one is using BOINC 7 and the second one with the error is using 6.12.34. Reupgprade to a BOINC 7 (I think 7.0.15 is the newest), or since you are using app_info already you could add You have a kind of weird case where you have 2 GPUs that support both CAL but only 1 supports OpenCL. The 4290 is based on an R600 core and doesn't support OpenCL. The older version of BOINC will give a device index of 1 to use the other GPU based on the CAL detection which would include that. The OpenCL device BOINC provides in 7 is correct and uses the OpenCL capable GPU. |
Send message Joined: 8 May 10 Posts: 576 Credit: 15,979,383 RAC: 0 |
The GPU exclusion should also work with 6.12 |
Send message Joined: 25 Jan 11 Posts: 271 Credit: 346,072,284 RAC: 0 |
Are you still using the on-board graphics to crunch at another project? no, i don't use the IGP to crunch at all. and yes, it is already ignored by BOINC via the cc_config.xml file. |
Send message Joined: 25 Jan 11 Posts: 271 Credit: 346,072,284 RAC: 0 |
The reason is how BOINC handles device indexing. If you look the first one is using BOINC 7 and the second one with the error is using 6.12.34. Reupgprade to a BOINC 7 (I think 7.0.15 is the newest), or since you are using app_info already you could add <cmdline> --device 0</cmdline>to force it to use that GPU. i was hoping to avoid changing my version of BOINC since v6.12.34 has worked so well for me for quite some time. but if that's what it'll take, then i'll do that later tonight and report back. i just hope it doesn't wreak havoc on the functionality of all the other projects i participate in... |
Send message Joined: 25 Jan 11 Posts: 271 Credit: 346,072,284 RAC: 0 |
ok, so i gave BOINC v7.0.15 a try and had no luck. the error seems to be of the same nature as the ones i was getting w/ v1.02 tasks on BOINC v6.12.34: Stderr output by the way, i had to detach and reattach to the project to even get a task in the first place. prior to that, i was getting the "not reporting or requesting tasks" crap in the event log. at that point i thought i would give BOINC v7.0.12 a try, as that is the version that is working for RAMen's 58xx series GPU WinXP 32-bit system...but again, no luck. i also had to detach and reattach from the project again to get a task and avoid the "not reporting or requesting tasks" message in the event log. and again, the error seems to be of the same nature as before: Stderr output i'm not sure what to try next. i will say that i added the device argument to the app_info.xml, but that changed nothing, and the stderr output error was the same. i also maintain that its just as big a problem that my host doesn't want to report or request tasks now that i've updated to BOINC >= v7.x.xx, and that i have to detach and reattach just to get a single task. any ideas? |
Send message Joined: 25 Jan 11 Posts: 271 Credit: 346,072,284 RAC: 0 |
*UPDATE* i let the system crunch overnight on some other projects with BOINC v7.0.15 and Catalyst 12.1 since i couldn't get MW@H running, and as i suspected, switching to BOINC v7.0.15 has had some adverse effects on my other projects. specifically, Einstein@Home crunches just fine on the CPU, and Collatz crunches just fine on the GPU, but neither of them reported any completed tasks overnight, nor did my host request any new work from either project. so it would appear that MW@H isn't the only project that has seen the "not reporting or requesting tasks" message in the event log since switching to BOINC v7.0.15. |
Send message Joined: 30 Dec 07 Posts: 311 Credit: 149,490,184 RAC: 0 |
Since it's giving errors on Device 1, I assume that is the card to exclude. So you tried the following cc_config.xml with a BOINC 7.0.xx version and it still didn't work? <cc_config> <options> <exclude_gpu> <url>http://milkyway.cs.rpi.edu/milkyway/</url> <device_num>1</device_num> </exclude_gpu> </options> </cc_config> You know about the different work buffer system of BOINC 7.0.xx versions? Connect about every x.xx days has now effectively become Minimum work buffer. In fact in the later 7.0.xx versions it has been renamed. If you leave it at 0 days which was previously recommended for an always on connection it will not download any new tasks until your cache is empty. With BOINC 7.0.15 I use a value of 1 day for Minimum work buffer and 0.1 days for Max additional work buffer. Due to unreliable work availability on another project I also use report_results_immediately in my cc_config.xml file. |
Send message Joined: 25 Jan 11 Posts: 271 Credit: 346,072,284 RAC: 0 |
Since it's giving errors on Device 1, I assume that is the card to exclude. So you tried the following cc_config.xml with a BOINC 7.0.xx version and it still didn't work? actually no - i used a GPU inclusion argument in the project's app_info.xml file (<cmdline> --device 1</cmdline>), not a GPU exclusion argument in the cc_config.xml file. and actually device 1 is not the device to ignore b/c that's my double precision-capable 5870. i suppose i should give this a try though, only i'll exclude device 0 instead of device 1. i should note that i already have an <ignore_ati_dev>0</ignore_ati_dev> argument in the cc_config.xml file so that BOINC ignores my motherboard's integrated HD 4290 GPU. You know about the different work buffer system of BOINC 7.0.xx versions? Connect about every x.xx days has now effectively become Minimum work buffer. In fact in the later 7.0.xx versions it has been renamed. If you leave it at 0 days which was previously recommended for an always on connection it will not download any new tasks until your cache is empty. With BOINC 7.0.15 I use a value of 1 day for Minimum work buffer and 0.1 days for Max additional work buffer. Due to unreliable work availability on another project I also use report_results_immediately in my cc_config.xml file. i read something about that, but i don't think that's what's affecting my ability to get new work. first of all, my "connect every x.xx days" was set to 0.10, not 0...and on top of that, i ran clean out of Collatz work overnight, and my host still didn't have any new Collatz work in the morning. so i met BOINC v7.0.15's requirement of running the Collatz project cache dry before requesting new tasks, and still it didn't fetch any... |
Send message Joined: 30 Dec 07 Posts: 311 Credit: 149,490,184 RAC: 0 |
Hmm, that's strange because the Stderr you posted of the error task is saying Device 1 is being used or rather trying to be used. Perhaps BOINC and CAL applications are identifying Device 0 and Device 1 differently than how the MilkyWay OpenCL application is identifying them. If you use multiple GPU exclusions and inclusions at the same time, perhaps it causes differences in different places in how the Devices get numbered. Yes work fetch on BOINC 7.0.xx versions caused me a lot of problems at first too. Seems to work alright with my current settings now though but I'm not doing any Collatz. Maybe my report results immediately setting is helping to cause new work to be requested. |
Send message Joined: 30 Dec 07 Posts: 311 Credit: 149,490,184 RAC: 0 |
Excuse the double post but I just noticed Matt advised you to try <cmdline> --device 0</cmdline> and you said you used <cmdline> --device 1</cmdline> Kind of lines up with what I was saying and Matt has already posted. The CAL applications and the OpenCL applications appear to be handling the device numbering differently due to the onboard graphics not being OpenCL capable. So there is only one device being detected by the OpenCL application. If you or the OpenCL application tries to force or use Device 1 then that is a higher number than the number of devices available hence the message "Requested device is out of range of number found devices" + if (clr->devNum >= nDev) + { + warn("Requested device is out of range of number found devices\n"); Whereas your successful CAL tasks have "Found 2 CAL devices. Chose device 1" in stderr. Getting the excluded GPU and the detected GPU correct may require different combinations of ignore, exclude and force arguments for OpenCL applications as compared to CAL applications. In other words what works for one may not work for the other as you have experienced. If there is only one OpenCL device detected and you use <ignore_ati_dev>0</ignore_ati_dev> perhaps that leaves no available OpenCL devices. The <exclude_gpu> cc_config settings available in BOINC 7.0.xx give greater flexibility in configuring all this separately for each GPU project or application. So if there is only one OpenCL capable device it may be sufficient to exclude the HD 4290 for CAL applications only. So perhaps try removing <cmdline> --device 1</cmdline> and <ignore_ati_dev>0</ignore_ati_dev> and instead use: <cc_config> <options> <exclude_gpu> <url>http://boinc.thesonntags.com/collatz/</url> <device_num>0</device_num> </exclude_gpu> </options> </cc_config> Not sure if it will work but worth a try. |
Send message Joined: 25 Jan 11 Posts: 271 Credit: 346,072,284 RAC: 0 |
actually i tried both <cmdline> --device 0</cmdline> and <cmdline> --device 1</cmdline>. nevertheless, its probably worth a try to remove both <cmdline> --device x</cmdline> from the app_info.xml and <ignore_ati_dev>x</ignore_ati_dev> from the cc_config.xml for now, and add what you suggested to the cc_config.xml file. it appears i have a few more things to experiment with, so i'll try to get on it tonight and report back as soon as i can... thanks, Eric |
Send message Joined: 8 May 10 Posts: 576 Credit: 15,979,383 RAC: 0 |
actually i tried bothI should have said --device 0 before. Removing that and using BOINC 7 should work |
Send message Joined: 30 Dec 07 Posts: 311 Credit: 149,490,184 RAC: 0 |
I should have said --device 0 before. Removing that and using BOINC 7 should work You did say --device 0 before. The reason is how BOINC handles device indexing. If you look the first one is using BOINC 7 and the second one with the error is using 6.12.34. Reupgprade to a BOINC 7 (I think 7.0.15 is the newest), or since you are using app_info already you could add <cmdline> --device 0</cmdline>to force it to use that GPU.... |
Send message Joined: 14 Feb 09 Posts: 999 Credit: 74,932,619 RAC: 0 |
I should have said --device 0 before. Removing that and using BOINC 7 should work Newest version is now 7.0.17, which fixes the backup project problem and several other problems. |
Send message Joined: 30 Dec 07 Posts: 311 Credit: 149,490,184 RAC: 0 |
Backup project problem. Aha, I hadn't thought of that as I do not use a backup project. Perhaps that was why Sunny129/Eric was having trouble getting any work for Collatz with the BOINC 7.0.xx versions he had tried. He may have had Collatz set with a resource share of 0 as a backup project. Thanks for that news arkayn. |
Send message Joined: 25 Jan 11 Posts: 271 Credit: 346,072,284 RAC: 0 |
i was unaware of the "backup project" problem...though i don't think it should affect me, as i do not use Collatz strictly as a backup project. i actually have Collatz and MW@H set to use equal [non-zero] resources, and i switch between the two of them by keeping one or the other suspended. at any rate, i went back to BOINC v6.12.34 last night simply b/c i was unable to crunch either of my 2 GPU projects on v7.0.12 or 15. i set "no new tasks," so i should be able to start testing again, this time on v7.0.17. |
Send message Joined: 25 Jan 11 Posts: 271 Credit: 346,072,284 RAC: 0 |
ok guys, on BOINC v7.0.17 i've tried excluding, ignoring, and forcing devices 0 (integrated HD 4290) and 1 (HD 5870) just for the heck of it, and nothing worked. i also removed all exclude, ignore, and force arguments and i'm still getting the same errors as before. i don't know why i didn't think of it before, but perhaps i should post my BOINC client's startup log. i'm not sure if it'll help, but there's a good chance that someone will see or understand something that i did not...here it is with no cc_config.xml file in the BOINC data directory or an app_info.xml file in the MW@H data directory: 2/20/2012 11:42:10 PM | | No config file found - using defaults if this doesn't help, i'm not sure i'm up to the task of trying all of the several combinations of exclude, ignore, and force arguments just to see if any such combination works. it looks like it might finally be time to stop using the IGP as a dedicated display GPU and start getting used to using the HD 5870 for the display AND crunching...who knows, maybe i'll find some target parameters that'll make GUI lag at least bearable. btw Matt, any idea how much time we have left before separation v0.82 is permanently deprecated? |
Send message Joined: 14 Feb 09 Posts: 999 Credit: 74,932,619 RAC: 0 |
Is there any way to disable the internal GPU via the bios, it looks like the 5870 is being disabled for OpenCL because the internal GPU is being disabled as the lesser GPU. |
©2024 Astroinformatics Group