Message boards :
Number crunching :
Conflict MW (ATI) & Aqua CPU
Message board moderation
Author | Message |
---|---|
Send message Joined: 9 Nov 07 Posts: 151 Credit: 8,391,608 RAC: 0 |
Attempting to run these two projects simultaneously but am running into difficulties whereby Aqua will use all 4 of my processors and then go into High Priority thus, not allowing MW to crunch. This is just a single Aqua cpu wu with a three week deadline so no problems finishing on time. Anyone have any suggestions? I've done a quick search of the boards and initially find no other comparision. |
Send message Joined: 28 Aug 07 Posts: 16 Credit: 70,797,368 RAC: 0 |
I see you are using BOINC 6.6.20... Maybe try 6.5.0 or one of the later 6.4.x clients. 6.5.0 works pretty good for my with AQUA on the CPU and Milkyway on the GPU. |
Send message Joined: 30 Dec 07 Posts: 311 Credit: 149,490,184 RAC: 0 |
I don't how it would work on a quad to change to 5 BOINC CPUs but for my W3520 (=i7) I change BOINC CPUs to 9 with <ncpus>9</ncpus> in a cc_config.xml file. This allows MilkyWay ATI tasks to continue processing with AQUA on the other 8 "cores". If I don't do this, when AQUA is running MilkyWay ATI stops after it completes the currently running tasks. I am using BOINC version 6.6.31. When I manually switch to another CPU project that is not multi-threaded I change back to 8 CPUs and process 7 tasks of CPU project. I prefer to leave a core free for MilkyWay ATI because otherwise it slows down a great deal if I leave ncpus at 9 and process 8 intensive tasks such as Einstein. You don't need to stop BOINC to do this, just click "Read config file" in Advanced section of BOINC Manager after you have made a change. The only small delay is that BOINC runs a benchmark every time the number of cores changes. I suppose I should try starting BOINC with the --skip_cpu_benchmarks command but I think that may only work in Linux. Recently I have been swapping between 8 and 9 cores a fair bit as I switch between AQUA and WCG, so I have a shortcut to cc_config.xml on my desktop to allow a quick right click, open in Notepad and change number of cores. |
Send message Joined: 2 Jan 08 Posts: 123 Credit: 69,810,136 RAC: 1,121 |
Well both Milkyway and Aqua both run in High Priority on my machines. They still both seem to work switching when required. The Linux machine I have left 'as is' and it runs CPU Milkyway WUs OK as well as AQUA, Climate Prediction, Docking and QMC. The Windows machine runs Milkyway on an ATI HD4870 with AQUA, Docking and Spinhenge on the CPUs. I modified the cc_config.xml file on the Windows machine to show 5 CPUs (I have 4) and use 6.4.7 Boinc Client with 8.12 ATI drivers on an Win XP SP3 install. Milkyway runs 3 jobs at once as per default settings for the card. With AQUA using up to 4 CPUs it still leaves room for Milkyway to run on the ATI GPU. Has been working a treat so far. The High Priority appears to have begun with the last couple of batches of AQUA work units, but for me is not causing a problem. Conan. |
Send message Joined: 4 Oct 08 Posts: 1734 Credit: 64,228,409 RAC: 0 |
I modified the cc_config.xml file on the Windows machine to show 5 CPUs (I have 4) and use 6.4.7 Boinc Client with 8.12 ATI drivers on an Win XP SP3 install. Your modification to the cc_config.xml file - I am a little mystified as to where this file is located. Can you enlighten me, please? When I go to the projects folder and open the MW one, I can only see the app_info file which has the following CPU related statements - <app_version> <app_name>milkyway</app_name> <version_num>19</version_num> <flops>1.0e11</flops> <avg_ncpus>0.1</avg_ncpus> <max_ncpus>1</max_ncpus> <cmdline>n2 f20 w0.80</cmdline> <file_ref> <file_name>astronomy_0.19_ATI_SSE2f.exe</file_name> <main_program/> If this is the file you refer to, then which of the CPU references do you alter? <max_ncpus>1</max_ncpus> from 1 to 5 or <cmdline>n2 f20 w0.80</cmdline> nX from, in my case, 2 to 5? Go away, I was asleep |
Send message Joined: 30 Dec 07 Posts: 311 Credit: 149,490,184 RAC: 0 |
It's not the app_info.xml file in C:\ProgramData\BOINC\projects\milkyway.cs.rpi.edu_milkyway It's the cc_config.xml file in C:\ProgramData\BOINC If you haven't created one it may not exist. Just create one in Notepad and save as cc_config.xml Make sure extension is saved as .xml not .txt on the end. Here's the current contents of mine for example: <cc_config> <options> <zero_debts>1</zero_debts> <ncpus>9</ncpus> </options> </cc_config> |
Send message Joined: 4 Oct 08 Posts: 1734 Credit: 64,228,409 RAC: 0 |
Thanks Tombei for the clarification. I looked in /C/programme files/BOINC/ and saw I have no cc_config.xml file (as you had surmised). The contents of your version, modified to fit a Penryn quad and a dual Prestonia Xeon with HT activated (acts like a quad) should be OK. Just changing the 9 to 5 in ncpus. I will report back on the effect on a quad, and whether it is equally applicable to a dual HTed P4 machine. Go away, I was asleep |
Send message Joined: 4 Oct 08 Posts: 1734 Credit: 64,228,409 RAC: 0 |
Looks like in my specific set up (2 ATI GPUs) that cc_config.xml files appear to stop Milkyway running. All the MW work in the BOINC caches is marked ready to run, including those that were crunching. After 50 minutes the crunching has not yet restarted, so I have shutdown BOINC removed the cc_config files and restarted BOINC. Unfortunately the work is still waiting to crunch, and I hope it will do so in time. Go away, I was asleep |
Send message Joined: 30 Dec 07 Posts: 311 Credit: 149,490,184 RAC: 0 |
Hope you can get it to work. I didn't think of it but I probably shouldn't have included the zero debts part. I just copied and pasted my cc_config.xml as an example. Using the zero debts option may not be compatible with how you have BOINC configured to enable scheduling that works. At least there are tasks available again now for when it starts and you now know how to create a cc_config.xml file if you need one in the future. |
Send message Joined: 3 Oct 07 Posts: 21 Credit: 49,862 RAC: 0 |
so I have shutdown BOINC removed the cc_config files and restarted BOINC. You have to shutdown Boinc before modifying the file so it can read it on start up. |
Send message Joined: 4 Oct 08 Posts: 1734 Credit: 64,228,409 RAC: 0 |
Did that before including the cc_config file, and again on removal. The AGP HD3850 on the dual HT Xeon is working OK, but the HD4850 on the quaddy is still frozen with work to crunch after 2 hours. When FreeHAL digs out a new bunch of work I will reboot the system. That reboot of the PC sorted things out. Work is flying theough the GPU, which suggests it is the short WUs being talked about. Yup! 27.80 and 37.18 credited WUs currently flying through. They are taking 31 seconds of CPU time as against 81- 84 seconds for the 74.24 credit WUs. I think I will let the RAC build for a day or so, then try and copy the cc_config file again. This time I will take out the 0 debits, as suggested by Tombei. Go away, I was asleep |
Send message Joined: 9 Sep 07 Posts: 22 Credit: 320,035 RAC: 0 |
Check your DCF for Aqua. If the DCF > 90 the task will run in high priority. There's a thread on the Aqua forms about it. Kathryn :o) The BOINC FAQ Service The Unofficial BOINC Wiki The Trac System More BOINC information than you can shake a stick of RAM at. |
Send message Joined: 29 Aug 07 Posts: 81 Credit: 60,360,858 RAC: 0 |
As we all know it, boinc devs are more into facebook and other socialnetworkingbuzzcrap, that they totally forgot what boinc is about: using available resources for distributed computing. I dont know what exactly NVIDIA did, to get into boinc before facebook, but IMO boinc's CUDA support is just a mere coincidence, considering its poor implementation and scheduling (which are only now beginning to work). So, ATI... Well in next millennium we might get something resembling ATI support and I am afraid that, human race will never see boinc supporting OpenCL; however I am not willing to wait that long. Therefore here is what I plan on trying: 1st: I'll modify boinc client, to allow more instances running, and compile it. 2nd: I'll divide projects by resources (CUDA, ATI, CPU, multi-threaded) and run a separate instance of boinc client for each of the resources with appropriate project set and resource limitations (ncpus = 1 for GPU instances, etc.). One thing I have not resolved yet is how to deal with multi-threaded apps - suggestions are more than welcome. This way I hope to maximize resource utilization, what ATM the latest and most socialnetworkedfacebookedrepublic boinc client just doesn't know how to do. Also, it wouldn't hurt if anyone from boinc_dev "team" stumbled upon my idea and consider it as a way of resource utilization scheduling in upcoming clients. Suggestions, comments? BR, |
Send message Joined: 29 May 09 Posts: 37 Credit: 34,016,951 RAC: 0 |
Well guys, I may have come up with a simple "work around" for the MW/Aqua problem. It seems there's a few other folks that are having a lot of trouble keeping MW WUs running in companion with Aqua. I too suffer from this problem. After reading this thread (and others) and trying MANY different configuration setups, I seem to have finally come up with something that seems to actually (sorta) work. Please keep in mind, this is not a "fix all" for this problem, but so far, seems to allow MW and Aqua to run together in seeming harmony. Currently, I had two Aqua WUs. One, running in "high priority" the other, "waiting to run". Additionally, I have several MW WUs running. When the MW WUs finish, in this situation, the manager does not fetch new work (usually) and even if it does, they just sit there "waiting to run" forever (if they are newly fetched WUs) or, they sit there "ready to report", forever. We all hate this and yes, manually it can be overcome if you plan to be a boinc manager, babysitter. Screw that! Here's what I've done: 1. In the cc_config.xml file, add or edit the line with ncpus to read: <ncpus>5</ncpus> Thank you goes to someone else for suggesting this idea. I'm running a quad core so this reflects one more cpu than actually exists. Doing just this alone, made the manager start running a second Aqua in parallel with the first (that was already running), both running in "high priority". Of course, at this point, there's no hope at all for any MW WUs to ever get done AND the second Aqua WU runs HORRIBLY slow, though it does progress and almost seems to be stalled, but it is not. Next........ 2. Suspend one of the two running Aquas. Being that I already had one of the Aqua WU at about 25% completion, I suspended the newly started WU. At this point, I let MW continue to run thinking, "yeah.....right". To my surprise after all the MW WUs completed, uploaded and were reported, the manager actually went and got more MW WUs and they actually STARTED on their own! Currently, I have the one Aqua WU running in "high priority", the second "suspended", a full cache of MW WUs and three WUs running (I only have a single ATI 4870 in the system ATM) my settings are to run 3 WUs on the single card and I can only cache 16 WUs at a time. Eleven MW WUs (normally twelve) are being reported as "running" while only three (which is normal for my setup) are actually gaining progression. WUs are being reported, new WUs are being downloaded and set as "Ready to start". When the currently running WUs complete, "Running" WUs that were not gaining progress begin to show they are (only three at a time, but that is still normal for my setup). "Ready to start" WUs switch to "Running" and everything looks to be quite normal and running very smoothly. On a final note, I don't claim to be a "guru" or "know it all" and someone else may have already suggested this idea, I don't know. What I can tell you, is this, for the last hour as I've watched the progression of WUs through my system, MW WUs and Aquas are running together without complication. If something falters, I'll let ya'll know. Meanwhile, happy crunching! Yo- On edit, here's my MY app_info.xml file: - <app_info> - <app> <name>milkyway</name> </app> - <file_info> <name>astronomy_0.19_ATI_x64f.exe</name> <executable /> </file_info> - <file_info> <name>brook.dll</name> <executable /> </file_info> - <app_version> <app_name>milkyway</app_name> <version_num>19</version_num> <flops>1.0e11</flops> <avg_ncpus>0.1</avg_ncpus> <max_ncpus>1</max_ncpus> <cmdline>n3 w1.1</cmdline> - <file_ref> <file_name>astronomy_0.19_ATI_x64f.exe</file_name> <main_program /> </file_ref> - <file_ref> <file_name>brook.dll</file_name> </file_ref> </app_version> </app_info> and my cc_config.xml file: - <cc_config> - <options> <exclusive_app>ehshell.exe</exclusive_app> <exclusive_app>Crysis64.exe</exclusive_app> <no_gpus>0</no_gpus> <ncpus>5</ncpus> </options> </cc_config> |
Send message Joined: 19 May 09 Posts: 30 Credit: 1,062,540 RAC: 0 |
I dont know what exactly NVIDIA did, to get into boinc before facebook ....... It’s quite simple. They placed rather large amounts of cash into certain people’s pockets. This is a tried and true method to ‘win friends’ and influence things into the direction you would like them to go. Just ask any member of the United States ( Bought and paid for ) congress. They will attest to the fact that this method works ! IE: Rep. Jefferson with 10K in his freezer.
I’d say, go for it. We’ll see if it works. I thought Crunch3R was working on it. Bill |
Send message Joined: 29 May 09 Posts: 37 Credit: 34,016,951 RAC: 0 |
As a follow up on my previous post on this thread, I have the following to report. After several hours of flawless crunching of MW WUs and Aqua running, I have noted that upon completion of the Aqua WU, the Boinc manager would not report the completed task even though the upload was successful and without any intervention. I had hoped that the manager would simply report the completed task and fetch a new Aqua but it just didn't happen. At this point I removed the remaining Aqua from suspension thinking that that may do the trick but alas, that didn't work either. The Aqua WU started running at this point, but the manager still wouldn't report the other completed WU. This left only one choice. I had to suspend MW for a brief period, click the "Update" button to force the manager to report the Aqua WU and fetch another WU. Which is exactly what happened. Once the new Aqua was downloaded, I suspended it and then removed MW from suspension. Everything went back to running as I had made note of in my previous post. Having to do this "trickery" every eight to ten hours is much more manageable than having to have to do it every twenty to forty minutes by manipulating MW to do the same thing. I noticed that after doing this, some of the MW WUs were sitting there "waiting to run" which had me a little worried, but eventually, everything came back into play and now I can go back to bed. Yo- |
Send message Joined: 29 Jul 08 Posts: 12 Credit: 60,445,018 RAC: 0 |
It is also possible to overcome many of these problems by running a virtual machine with VMWare, Virtual Box or even Virtual PC. Then you have to independant Boinc installations that know nothing about eachother and will happily crunch one project on each. MW on the native OS with ATI, second project under VM. The result is two or more projects from the single physical host reported of course as two hosts in your accounts and stats. You need to accept a small overhead loss due to the VM, but personally I've found this better than having to micromanage Boinc. Symington weather report and video feed |
Send message Joined: 29 Aug 07 Posts: 81 Credit: 60,360,858 RAC: 0 |
It is also possible to overcome many of these problems by running a virtual machine with VMWare, Virtual Box or even Virtual PC. Then you have to independant Boinc installations that know nothing about eachother and will happily crunch one project on each. Hey. I already tried that, but linux running BOINC in VirtualBox would crash every time I assign more than one CPU to it. Now, I wouldn't like to run a VM for each processor. (If anyone is interested I tried it on my q9450@3.2GHz, 2G RAM, ATI 4870 1G, VBox 3, WinXp 64 host OS, Ubuntu 9.04 guest OS) I am now in process of acquainting myself with BOINC code, however progress is slow, as I have a lot of work at job and this week I am also dog/house-sitting for a friend. Anyway, there seems to be mo work available here again, so I guess, there is no need for me to rush. BR, |
Send message Joined: 29 May 09 Posts: 37 Credit: 34,016,951 RAC: 0 |
Personally, I hate to have to micromanage the Boinc manager. Apparently, this really seems to be a manager "conflict of interest" or some such thing. Some would call it a bug. Others tend to think it's the WUs of different projects causing the problems, but I am convinced it's strictly a problem to do only with the manager itself and how it handles the projects. Might there be an alternative manager that we can run these projects on that doesn't exhibit these problems? Or, is this something that simply just needs to be fixed? In any case, we shouldn't have to deal with these headaches. In fact the Boinc people and the project people that come up with stuff should be paying us for doing their work for them. I have to pay dearly for the electricity to run these projects and I do this out of the kindness of my heart and the belief that these projects will one day help the world as a whole, if not already. The least they could do is give us a manager that actually runs the projects they way we'd like it to. Half the setting don't seem to do anything, the rest seem to be broken and those that actually Do, do something, seem to do very little. Rant - disabled Yo- :) |
Send message Joined: 29 Aug 07 Posts: 81 Credit: 60,360,858 RAC: 0 |
Look what I just found in checkin notes: David May 12 2008 Now, on to try it out. [edit]checkin quote[/edit] BR, |
©2024 Astroinformatics Group