Message boards :
Number crunching :
Problem Clients
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Zydor![]() Send message Joined: 24 Feb 09 Posts: 620 Credit: 100,587,625 RAC: 0 ![]() ![]() ![]() |
You are running an app_info.xml, its probable - depending on its contents - that is the cause of the problem. I am reading between the lines from what you have said, and its likely that the app_info file is specifying the wrong application file, and every WU will fall over. Need to get rid of the app_info for now (maybe for good, look at the latter later), and detatch & re-attatch the machine to MW. You will find the app_info.xml file in the MilkyWay directory. Suggest you either rename it "app_infoOLD.xml", or move it to another directory, or delete it. Only do the latter if you are sure you will not need its contents to refer to (I doubt it, rebuilding another is easy). Go to BOINCStats and in the Host details for that machine, detatch it, press "set" at bottom of the table, refresh the page, change it back to attatched and press set again. Once thats done, close BOINC (if not done so already), and restart it. *should* be ok, maybe other things wrong - one step at a time. You should then have the new 0.62 app downloading and crunching as your driver is up to date (assuming you loaded the one with the APP set). If it falls over, post again Regards Zy |
![]() ![]() Send message Joined: 24 Dec 07 Posts: 1947 Credit: 240,884,648 RAC: 0 ![]() ![]() |
I agree with Zy. I have a 3850 crunching quite happily with 11.3 AGP hot fix WITH the APP set and without the use of the app_info file. |
![]() Volunteer developer ![]() Send message Joined: 17 Feb 08 Posts: 363 Credit: 258,227,990 RAC: 0 ![]() ![]() |
The problem is not the 3850, It's your CPU that causes the crash. The app is compiled with SSE2 enabled and every time the app tries to calculate the likelihood on the CPU, it crashes. There's nothing you can do about that ATM. ![]() Join Support science! Joinc Team BOINC United now! |
![]() Send message Joined: 25 May 09 Posts: 6 Credit: 22,212,608 RAC: 32 ![]() ![]() ![]() |
Thanks guys for your answers. @Crunch3r: what a bad news! Happy Easter to all of you. Marco. |
Alinator Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0 ![]() ![]() |
..... The $64 question is that up until the recent server issues both GPUs ran fine, in fact they ran at similar speeds. I've had the same problem lately with M4A78T-E (HD3300) and my HD 4850. However, the problem seems to be with the BOINC CC itself ignoring the fact that MW GPU WU's can't run on an SP GPU and tries to start them on it anyway. The cc_config workaround is fine if all you run is MW, but disables the SP GPU for all projects. So my question is, is there a way to disable the integrated GPU with an MW app_info file, rather than use the cc_config approach? |
Zydor![]() Send message Joined: 24 Feb 09 Posts: 620 Credit: 100,587,625 RAC: 0 ![]() ![]() ![]() |
Doubt it, its designed for use on an application thats "running", not to specify which GPU to use, no documented elements for the latter. Full syntax There might be some undocumented elements floating around, but I cant see that, as an app_info points to an application to run, not select what GPU to use. "cmdline" may be a hope, but even that is clutching at straws as that points only to the application specified at the top of the file, again no GPU control. Suspect you'll be out of luck on an app_info approach. You can run a second BOINC Client concurrently, needs a bit of installation configuration to stop clash with current Client, but it can be done. In that you could specify the "second" client's cc_config to point at the GPU you want. Its a way of doing it .... watch the config carefully, and version numbers of the Client to make it work. Post again if you need a pointer on setting it up. Regards Zy |
Alinator Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0 ![]() ![]() |
Yep, I finally got into the BOINC Wiki (don't know what its problem has been lately), and didn't see an obvious way to do it. Not surprising given its purpose is exactly as you described. What I don't understand though is how we could get to version 60 in the series and still have such a glaring task scheduling fault in the CC!!??. Just for reference, this is the scenario on my host: I run MW and Collatz on the GPU's with an equal Resource Share. Most of the time it runs fine. However, since Collatz can run tasks on all three devices and MW can only use the 2 DP devices (the 4850 splits into 2 virtual ones since there are 2 monitors hooked up to it), eventually it will gain more net GPU run time than MW. So the CC shuts down work fetches for Collatz to let MW catch back up. The trouble occurs if the Collatz task running on the SP GPU finishes before MW has caught up and there is nothing else left in the GPU cache to run except MW. So instead of correctly detecting the fact there is nothing suitable to run on the SP GPU and allowing a fetch from Collatz, it sets off a cascade failure on MW. In one case I had about a month ago, it not only killed all the MW tasks I had on board but got into a condition where it was failing all tasks, CPU included! I suspect the reason for that was I wasn't home to catch the failure early enough to keep the cascade from getting completely out of control. <edit> Thinking back to that major failure I had a few weeks ago, I think the problem is even worse than I outlined above. At that time I was running a 1 day work cache setting, so I'm pretty sure there was a Collatz task which could have run on the SP GPU when the time came to let MW catch up. IOW's the CC still tried to start the MW task on the SP GPU, even though it "knew" it wasn't possible AND there was a suitable alternative available. Needless to say I wasn't too happy about that! |
Sunny129![]() Send message Joined: 25 Jan 11 Posts: 271 Credit: 346,072,284 RAC: 0 ![]() ![]() ![]() |
I've had the same problem lately with M4A78T-E (HD3300) and my HD 4850. i have the same M4A78T-E w/ the integrated HD 3300 GPU, only my discrete GPU is a 5870. i had the same problem with BOINC ignoring the fact that MW@H can't use the 3300 GPU, and had to resort to using a cc_config file too. i also like to crunch S@H MB & AP w/ my 5870. b/c the cc_config file affects all projects, and b/c it specifically causes adverse effects for S@H, i can only run MW@H or S@H, (not both at the same time). i have to edit the cc_config (to change which device is being ignored) and restart BOINC every time i switch between MW@H and S@H projects. if i don't do that first before making the switch, WU's begin to error out one after another. its a PITA, but at least we have some control over it. in short, there is no way to disable the 3300 for MW@H only via an app_info file. when i originally started troubleshooting the above problem, i was running S@H, not MW@H. the entire history can be found in THIS thread over at the SETI@Home message boards. you'll note that i had help from several testers, as well as the gentleman who wrote the optimized SETI@Home GPU applications code himself. the only current solution for a system running 2 or more GPU's (one of which is not capable of running MW@H) is to use a cc_config file that ignores that particular GPU. ![]() |
Skyflash Send message Joined: 13 Dec 09 Posts: 9 Credit: 10,302,632 RAC: 0 ![]() ![]() |
I also have the same problem (calculation error) with my Radeon HD 4850 1Gb graphics card. I cannot find the app_info.xml in the Milkyway directory and don't know where to detach and reattach the machine to MW. Hope someone can help. |
Sunny129![]() Send message Joined: 25 Jan 11 Posts: 271 Credit: 346,072,284 RAC: 0 ![]() ![]() ![]() |
I also have the same problem (calculation error) with my Radeon HD 4850 1Gb graphics card. I cannot find the app_info.xml in the Milkyway directory and don't know where to detach and reattach the machine to MW. Hope someone can help. are you sure its the same problem? what exactly is the nature of your calculation error? in other words, does your client at least crunch 1 MW@H task normally while all others error out one after another? or do all MW@H tasks error out the instant you start BOINC or un-suspend the MW@H project? if its the latter, then isn't the same problem and i wouldn't know how to tackle the problem right away. if its the former, then BOINC is recognizing 2 GPU's (your 4850 and your mobo's integrated GPU) and trying to start a 2nd task on the intgerated GPU, which in turn causes the task to error out immediately b/c the integrated GPU isn't capable performing the double-precision floating point operations required by MW@H. again, in this case, the only solution would be to place a cc_config.xml file in the main BOINC data directory with the following contents: <cc_config> ...where "n" is the # of the device you want BOINC to ignore. ![]() |
Alinator Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0 ![]() ![]() |
I've had the same problem lately with M4A78T-E (HD3300) and my HD 4850. LOL... Yes, I was watching your conversation over there last month. After reviewing it I can see you have a pretty good "laymans" handle on the problem. Although I disagree with some of the analysis. I don't think the CC actually does a re-enumeration of potential target GPU capabilities every time before it starts a new task. This was implied by a couple of the replies, but maybe it should! ;-) I think it just goes back to client_state, and since with the hardware config we are running it gets mistakenly stored as we have all DP capable devices, it's not too surprising the scheduler thinks it's cool to try and start MW on the 3300. |
Skyflash Send message Joined: 13 Dec 09 Posts: 9 Credit: 10,302,632 RAC: 0 ![]() ![]() |
are you sure its the same problem? what exactly is the nature of your calculation error? in other words, does your client at least crunch 1 MW@H task normally while all others error out one after another? or do all MW@H tasks error out the instant you start BOINC or un-suspend the MW@H project? if its the latter, then isn't the same problem and i wouldn't know how to tackle the problem right away. if its the former, then BOINC is recognizing 2 GPU's (your 4850 and your mobo's integrated GPU) and trying to start a 2nd task on the intgerated GPU, which in turn causes the task to error out immediately b/c the integrated GPU isn't capable performing the double-precision floating point operations required by MW@H. again, in this case, the only solution would be to place a cc_config.xml file in the main BOINC data directory with the following contents: Ok, I just found out that my Intel Core I5 has an integrated GPU, didn't know that ;) But I'm not sure where the data directory is, I put the file in the /programfiles/boinc directory but nothing happened. Also put it in the /programdata/boinc dir, nothing happened too. I am guessing the "n" should be 1 here? Cause my Ati GPU still works in Boinc. But maybe it is not the first problem, cause I never had a normal finished MW-workunit since the problem started. Collatz works ok. |
Sunny129![]() Send message Joined: 25 Jan 11 Posts: 271 Credit: 346,072,284 RAC: 0 ![]() ![]() ![]() |
Ok, I just found out that my Intel Core I5 has an integrated GPU, didn't know that ;) But I'm not sure where the data directory is, I put the file in the /programfiles/boinc directory but nothing happened. Also put it in the /programdata/boinc dir, nothing happened too. I am guessing the "n" should be 1 here? Cause my Ati GPU still works in Boinc. But maybe it is not the first problem, cause I never had a normal finished MW-workunit since the problem started. Collatz works ok. regarding the location of your BOINC data directory, it'll depend on your operating system. my OS is Windows XP Pro 32-bit, so the location of my BOINC data directory looks like this: C:\Documents and Settings\All Users\Application Data\BOINC you can delete the cc_config.xml file from the BOINC directory in your program files folder - that is not the BOINC data directory, it is simply the location where you installed BOINC. i'm not sure about the /programdata/boinc directory you mentioned b/c again, i'm running WinXP and am not familiar with the above directory...perhaps someone else can verify that the /programdata/boinc directory is in fact the BOINC data directory for another OS. with regard to editing the cc_config.xml file to ignore a particular device, you first must reference the BOINC message log on start-up. if BOINC has been running for some time, just scroll to the top of the message log to view any start-up dialogue. that start-up dialogue will show you the #'s that have been assigned to your GPUs. on my rig, BOINC recognizes the 3300 as GPU 0 and the 5870 as GPU 1, so in order for MW@H to run normally, i have to edit the cc_config.xml file to look like this:
that way BOINC is forced to ignore GPU 0 (my mobo's integrated 3300 GPU). hence tasks never error out b/c they all get sent to the 5870 and none get sent to the 3300. ![]() |
Zydor![]() Send message Joined: 24 Feb 09 Posts: 620 Credit: 100,587,625 RAC: 0 ![]() ![]() ![]() |
For SkyFlash on 64bit OS his data directorys are: BOINC: c:\ProgramData\BOINC (put cc_config here for 64 bit) The ProgramData directory for 64bit is hidden by default, if its not seen unhide it via display options for directories. For SkyFlash on 64bit, the app_info goes into the MW project directory: MW Data Directory: C:\ProgramData\BOINC\projects\milkyway.cs.rpi.edu_milkyway Regards Zy |
Skyflash Send message Joined: 13 Dec 09 Posts: 9 Credit: 10,302,632 RAC: 0 ![]() ![]() |
Ok, I see now that my ATI GPU is #0 but I don't see another GPU and therefore no other # that I can put in the cc file. Also the first message in the message log is "unrecognized tag in cc_config.xml: <ignore_ati_dev>" And if I put ignore #0 (my ati gpu) it will still work, so something is still wrong there. @Zy: I don't have an app_info.xml file in that directory you mentioned. (I indeed have Windows 7 64bit) Should I also create one? And what should I put in it? |
Alinator Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0 ![]() ![]() |
Hmmm... At this point I think we need some more info about your host in question. First off knowing what kind of motherboard and chipset it has would be a help. Also posting the startup part from the BOINC Event Log/Message Tab would be helpful as well. The lines where it lists the GPU hardware enumeration is the important part for getting started on troubleshooting this. One thing I ran into with my host is that if I chose to make the discrete GPU the Primary Adapter, then I had to enable another BIOS parameter to allow the IGP to function as the secondary adapater and thus be detected by BOINC. For my ASUS it was called Surround View in the BIOS, IIRC. <edit> BTW, the error message typically means you didn't get the syntax quite right. My guess would be missing the leading slash in the closing element tag, or forgetting it completely (the two most common mistakes I make). ;-) <edit2> After looking over the failed tasks for your host I noticed you're running 6.10.18 for a CC. You might want to try updating to a newer version and see if that helps any. |
Zydor![]() Send message Joined: 24 Feb 09 Posts: 620 Credit: 100,587,625 RAC: 0 ![]() ![]() ![]() |
..... Ok, I see now that my ATI GPU is #0 but I don't see another GPU and therefore no other # that I can put in the cc file. Also the first message in the message log is "unrecognized tag in cc_config.xml: <ignore_ati_dev>" You need a minimum of 6.10.19 to use that ignore ATI command, 6.10.18 and previous versions know nothing about that command, hence the error message you are getting when using it. If you update, go to 6.12.22 - they had a drama with 6.12.23 & 6.12.24, the latest 6.12.25 *looks* as if they solved the issues (caused by a MS update concerning a security vulnerability). I would stay at 6.12.22 for now until the dust settles a little on the issues they had, that version recognises all the Config commands you need. The full download list for BOINC application is at: BOINC Downloads All Versions The full list of config commands is at: cc_config.xml Command List The command useful for the problem you have (assuming the machine and BOINC can see both - look at start of BOINC Client messages, it will say what GPUs it sees) is: <ignore_cuda_dev>N</ignore_cuda_dev>, <ignore_ati_dev>N</ignore_ati_dev> ignore (don't use) a specific NVIDIA or ATI GPU. You can ignore more than one. New in 6.10.19 The app_info deals with application related issues, not applicable to this unless you are pointing a specific application at a specific GPU. App_Info instructions are at: app_info.xml Instructions If you want to check the contents of your cc_config before using it, paste it into a post, always someone around to check it. If you dont know where to start putting one together, post. Regards Zy |
Skyflash Send message Joined: 13 Dec 09 Posts: 9 Credit: 10,302,632 RAC: 0 ![]() ![]() |
You need a minimum of 6.10.19 to use that ignore ATI command, 6.10.18 and previous versions know nothing about that command, hence the error message you are getting when using it. If you update, go to 6.12.22 - they had a drama with 6.12.23 & 6.12.24, the latest 6.12.25 *looks* as if they solved the issues (caused by a MS update concerning a security vulnerability). I updated the Boinc manager with 6.12.22 and now the cc_config works... but only on my Ati gpu, I don't know if there is another GPU I can shut down. Below is the startup message log from Boinc. As you can see I ignore ATI Gpu 1 (because the normal is 0) and that doesn't do anything. 27-4-2011 2:52:07 | | Starting BOINC client version 6.12.22 for windows_x86_64 27-4-2011 2:52:07 | | Config: ignoring ATI GPU 1 27-4-2011 2:52:07 | | log flags: file_xfer, sched_ops, task 27-4-2011 2:52:07 | | Libraries: libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.5 27-4-2011 2:52:07 | | Data directory: C:\ProgramData\BOINC 27-4-2011 2:52:07 | | Running under account Arjen 27-4-2011 2:52:07 | | Processor: 4 GenuineIntel Intel(R) Core(TM) i5 CPU 750 @ 2.67GHz [Family 6 Model 30 Stepping 5] 27-4-2011 2:52:07 | | Processor: 256.00 KB cache 27-4-2011 2:52:07 | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 syscall lm vmx smx tm2 popcnt pbe 27-4-2011 2:52:07 | | OS: Microsoft Windows 7: Home Premium x64 Edition, (06.01.7600.00) 27-4-2011 2:52:07 | | Memory: 3.99 GB physical, 7.98 GB virtual 27-4-2011 2:52:07 | | Disk: 931.41 GB total, 53.69 GB free 27-4-2011 2:52:07 | | Local time is UTC +2 hours 27-4-2011 2:52:07 | | ATI GPU 0: ATI Radeon HD 4700/4800 (RV740/RV770) (CAL version 1.4.467, 1024MB, 1200 GFLOPS peak) |
Zydor![]() Send message Joined: 24 Feb 09 Posts: 620 Credit: 100,587,625 RAC: 0 ![]() ![]() ![]() |
....... I updated the Boinc manager with 6.12.22 and now the cc_config works... but only on my Ati gpu, I don't know if there is another GPU I can shut down. Below is the startup message log from Boinc. As you can see I ignore ATI Gpu 1 (because the normal is 0) and that doesn't do anything..... We need to be clear what it is you are trying to achieve, and how many GPUs are live (BOINC messages in the PC BOINC Client will tell you the latter and their outline type without an exclude statement in the cc_config). If for example, you want the screen attatched to an integrated GPU on the motherboard (not all motherboards have them, and not all that do can be used at MW), then that IGP needs to be made live in the Bios. Having done that, the screen attatched to the IGP socket on the motherboard and a driver loded on to it as its now a live device. If you wish to exclude that IGP from BOINC, thats when the cc_config is used to exclude the GPU number reported in BOINC messages for that IGP. Other combinations/solutions etc - all depends what you want to achieve. Regards Zy |
Skyflash Send message Joined: 13 Dec 09 Posts: 9 Credit: 10,302,632 RAC: 0 ![]() ![]() |
What do you mean with "screen"? I don't know much about all you guys are saying, I really appreciate the help though. But I only want my Ati GPU to do some MW-work again, like before. The messages I posted only mention the Ati GPU, so you think the integrated (non-live) GPU isn't the problem? |
©2023 Astroinformatics Group