Message boards :
Number crunching :
Some feedback on Milkyway GPU crunching with various GPUs
Message board moderation
Author | Message |
---|---|
Send message Joined: 5 Oct 09 Posts: 22 Credit: 22,661,352 RAC: 0 |
Hello everybody, just installed Milkyway on a few of my rigs, and wanted to let you know how this goes for me (everyone can use some positive feedback for a change, can't they? ;) ) First of all, let me say I am delighted to have found a project that actually WORKS on ATI GPUs without all the stupid issues, requirements and workarounds needed (see FAH). All I had to do to get them to work was: 1. Update my BOINC to the latest 6.10.x beta client 2. Attach to Milkyway@home (obviously) and set my preferences to use GPU while PCs are in use, NOT use CPUs (my CPUs are all taken for World Community Grid) 3. Close down BOINC and all BOINC services 4. Download the 0.20 ATI Win64 client from here and put it in the project data directory 5. Copy and rename the 3 aticalxxxx.dll files from the system32 directory to amdcalxxxx.dll (so that I had both the atical and amdcal files in there for a total of 6 files) 6. Start up BOINC again and let it rip ;) So far, I have tried this on: - HD4870X2 running Server 08 x64, Catalyst 9,9 - HD4870 1GB running Vista HP 64, Catalyst 9,9 - HD4850 512MB running Server 08 x64, Catalyst 9,6 Works on all machines with incredible performance (47s per WU per GPU in the 4870's, 54s on the 4850) :) No errors so far, seems to work great. What is truly amesome: Some machines, for example the HD4850, are not conencted to any pheripherals. I control them over LAN using UltraVNC, which poses a problem for most GPGPU apps. Not so for Milkyway, it works just fine without any screens attached. One issue I had with the 4870X2: It jumps back an forth between 2D and 3D every time a new WU is started, causing the screen to flicker and flash. Also, the 2nd GPU doesn't come out of 2D clocks. Solution: Simply create a CCC Profile with identical 2D and 3D clock rats and voltages (there are guides around the net how you can do that). Activate that profile before you start BOINC/Milkyway and you'll be fine, no screen flickering and 2 WUs every 47 seconds :D Now, I also tried this on an Nvidia rig, running 2x GTX 280's in SLI. Here, it works even better (fool proof really). I just attached to Milkyway, and it automatically downloaded all the apps and dlls and whatnot, then it started running just fine on both GPUs. I have to say the performance is way better on AMDs though, it takes the GTX 280's around 3 minutes per WU, that's rather slow compared to the ATIs. OS is Vista 64 Ultimate, driver is 190.62. SLI was enabled, no CUDA Toolkit or anything else installed. Critique: Well, there has to be some, right?! And well, there is.. small points, but valid ones IMO. I will give you 3 points that can be improved, in my opinion: 1. Make ATI cards work automatically, just like NV GPUs. While being relatively easy to do for someone like me, who has suffered through FAH manual install routines, following the steps neccessary to get an ATI GPU crunching may still be too much of a hassle, causing many people to abandon the idea of joining the project. As I said, without the manual copying and the Catalyst workaround, Milkyway wouldn't work on any of the ATI GPUs. 2. Seriously. Make the WUs bigger. 47s on a HD4870 is WAY too short. It cloggs my BOINC logs and it looses performance because of the high switching interval between the WUs (takes a second or so for the enxt WU to start). Just imagine what these WUs will be like on the upcoming HD5000 series. 3. Credit system. The PPD Milkyway is claiming are extremely inflated and WAY over the top. One HD4870 can do around 100k BOINC PPD on that project. For comparison, I have around 12 state-of-the-art rigs running on World Community Grid, half of them Dual or Quad sockets, all overclocked. Not counting Hyperthreading, the surpass the 200Ghz mark when counted together. This extensive farm does around 30k BOINC PPD ;) Finally, I will be getting a HD5870 shortly. Inetrested to see what it can do.. probably 20s WUs, LOL! |
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0 |
First of all, let me say I am delighted to have found a project that actually WORKS on ATI GPUs without all the stupid issues, requirements and workarounds needed (see FAH). Thanks! Critique:1. Make ATI cards work automatically, just like NV GPUs. While being relatively easy to do for someone like me, who has suffered through FAH manual install routines, following the steps neccessary to get an ATI GPU crunching may still be too much of a hassle, causing many people to abandon the idea of joining the project. That is work the project has to do. I provided them already with the necessary applications to enable the automatic downloads of the appropriate version without the need to copy the dlls. Guess we have to wait a bit for the implementation. 3. Credit system. The PPD Milkyway is claiming are extremely inflated and WAY over the top. One HD4870 can do around 100k BOINC PPD on that project. For comparison, I have around 12 state-of-the-art rigs running on World Community Grid, half of them Dual or Quad sockets, all overclocked. Not counting Hyperthreading, the surpass the 200Ghz mark when counted together. This extensive farm does around 30k BOINC PPD ;) Have you tried crunching the WUs on the CPUs? The GPUs take the exact same WUs and just churn out the results so much faster. So you have to decide if you want to give the CPU guys roughly the same amount of credits as with other projects. That leads to the high pay for GPUs you experience now. If one would divide the credits per WU by two for instance, the pay on CPUs would be significantly lower than on other projects. The Milkyway GPU applications scale very well with the processing power these GPUs have in relation to CPUs. One simply gets a near perfect efficiency, in stark contrast to some other projects which tried GPU applications. Even comparing it to the (well optimized) GPUGrid application for instance, one sees that the ATI cards achieve here a higher amount of calculated operations in double precision than a GTX285 is able to do over at GPUGrid in single precision, even as there is a severe penalty for double precision on current GPUs (actually much higher on nvidia cards, which is the reason a GTX280 is trailing even an old and rusty HD3850). The same effect can be seen at the Collatz Conjecture project (automatic downloads of the ATI application work there with BOINC 6.10.8+, even the HD2000 line and IGPs starting with a HD3200 run quite well, and the WUs are also longer). The CPU application is far from being bad (I have written both, the CPU as well as the ATI application). It even uses assembler for the critical loop to get the work done as fast as possible (and 64bit is twice as fast as 32bit). But still, already a HD3200/3300/4200 IGP (780G/790GX/785G chipset) starts to rival the throughput of a quad core CPU. If one has a truely parallel task, nothing comes close to GPUs with hundreds of parallel units instead of just the 4 cores of a CPU. That's simply the way it is. GPUs are really that much faster than CPUs in both projects as the credits indicate. Finally, I will be getting a HD5870 shortly. Inetrested to see what it can do.. probably 20s WUs, LOL Quite a good guess! You can see in this thread what other are getting. As most HD5870 appear to be able to handle an overclock to the 950MHz range, even sub 20 second times may be possible ;) |
Send message Joined: 6 Apr 08 Posts: 2018 Credit: 100,142,856 RAC: 0 |
|
Send message Joined: 5 Oct 09 Posts: 22 Credit: 22,661,352 RAC: 0 |
Hi, I think you missedmy point - it's not that the GPU units are faster than the CPU units, I was simply stating that, compared to other (bigger) projects like World Community Grid, the PPD Milkyway generates are simply way out of line, for both the CPU and GPU WUs. GPUGrid has a similar "problem" but it's not nearly as grave (my overclocked GTX 260-216 on GPUGrid does around 15k PPD). It's just bad for comparison, that's all. I take it the unmistakable irony on Ice's post above this one means I'm not alone with this feeling ;) Got my 5870 lying right here by the way, now to find a vacant PCIe slot.. :D |
Send message Joined: 24 Dec 07 Posts: 1947 Credit: 240,884,648 RAC: 0 |
I think you missed the point. You need to compare a stock cpu app against other projects stock cpu app. It just so happens that a volunteer has now optimised that stock app and compiled it for use on gpu....that is the beauty of a project that opens up it source code for optimisation by the project volunteers. Those projects who don't or can't do suffer, but they will just have to rely on the folks who do gpu crunching to put their cpu's to use those projects. What the heck, the loss of a couple thousand credits per day by doing work for MalariaControl on the cpu is no biggy when your doing 150k+..... |
Send message Joined: 5 Oct 09 Posts: 22 Credit: 22,661,352 RAC: 0 |
I did.. one WU gives 53 BOINC credits right? I crunched one in the stock app.. it took 20 minutes or so, running on 1 thread of my Gulftown CPU @ 4Ghz. So if that is 53 credits per every 20 minutes, you'll get approx. 159 BOINC credits / hours per CPU per core/thread, right? Well, on the average WCG project, you get like 100 BOINC credits for a 4h runtime WU (on the same CPU). And then along comes the qorum and cuts it down some, so you're effectively getting 75 credits or so for a 4h WU. So we're looking at 636 vs. 75 credits for the stock CPU runtime. Means that milkyway claims (on average) 850% higher than WCG, WITHOUT the optimized apps. See where this is going :D Now I really don't care about getting the most points (or I wouldn't run WCG, obviously), I just think it would be nice if inter-BOINC projects would try to be at least halfway comparable to each other. |
Send message Joined: 24 Dec 07 Posts: 1947 Credit: 240,884,648 RAC: 0 |
20 minutes sounds like an optimised cpu app. See this host. |
Send message Joined: 21 Aug 08 Posts: 625 Credit: 558,425 RAC: 0 |
Bah Humbug... The better thing to do is to not compare projects at all, but there are still tons of people who cling tenaciously to this broken credit system and the pipedream that "parity" can be salvaged out of this wreck... However, per "The Bible of BOINC Cross Project Parity" located here, Milkyway is currently paying less than SETI, just not as much less than SETI as WCG is...if one believes in such silly charts...given that the same silly chart says that WCG is paying out slightly more than Rosetta, but Rosetta is closer to parity with SETI than WCG is (it should show that Rosetta is even lower compared to SETI, but yet it shows higher)... Ah, the Follies of Fall are upon us... Credit parity... Postings of graphs that backfire.... What ever will be next...? |
Send message Joined: 5 Oct 09 Posts: 22 Credit: 22,661,352 RAC: 0 |
20 minutes sounds like an optimised cpu app. See this host. Naw, just an unreleased CPU ;) Bah Humbug... You are probably right, best to simply ignore the effed up Credit system since I/we can't change it anyways. It just never really occured to me that there are differences that huge between the projects. By the way, I just installed my HD5870. Running 900 Core @ 1,1V (so slightly undervolted), it's doing the WUs in 21s average. 53 credits every 21s.. I'm gonna need a LONG time to get used to that lol. |
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0 |
20 minutes sounds like an optimised cpu app. See this host. Looks like some engineering sample of the hexacore Gulftown, isn't it? Damn, should output some more information from my the CPU detection library ;). It's all in there, only not written to the file. By the way, judging from the reported times, your HD5870 do the WUs actually in just below 20 seconds. All the other stuff is just the handover between different WUs and some CPU calculations prior or post the GPU work. You can hide the ~2 seconds the GPU is idle by running 2 or three WUs on the GPU (and waiting some minutes until they don't all start exactly the same instant). PS: You don't have a Larrabee at your hands? What about trying it on that? ;) Running Milkyway@home ATI GPU application version 0.20 (Win64) by Gipsel |
Send message Joined: 5 Oct 09 Posts: 22 Credit: 22,661,352 RAC: 0 |
Yes, I got a few Gulftowns running already, real crunching monsters they are ;) No Larrabee here - yet ;) And you're right, I just checked the results page and it said 19,6-19,8s per WU on the 5870. I tried running 2 WUs on one GPU with the appropriate commandline argument on a HD4870 yesterday, but it still wouldn't do 2 WUs at the same time, so I figured it doesn't work... but maybe I was doing something wrong, gonna try again tomorrow (1:30am here) |
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0 |
I tried running 2 WUs on one GPU with the appropriate commandline argument on a HD4870 yesterday, but it still wouldn't do 2 WUs at the same time, so I figured it doesn't work... but maybe I was doing something wrong, gonna try again tomorrow (1:30am here) Ich weiß, wir haben die gleiche Uhrzeit ;) To get more than a single WU running per GPU you don't have to use the command line options. To cite the readme of the new 0.20b version: Configuration of the number of concurrent WUs with new BOINC clients (6.10.3 and up): And don't use the n parameter if you have a 6.10.x BOINC client! |
Send message Joined: 5 Oct 09 Posts: 22 Credit: 22,661,352 RAC: 0 |
Ah, okay. I don't even have the 0.20b yet, time for an update already I guess ;) Yes, I was using the n parameter because that was in the readme of the one I downloaded a few days back. Explains why it wouldn't work ;) Where in Germany do you live, by the way? Ich bin aus Stuttgart ;) |
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0 |
Where in Germany do you live, by the way? Ich bin aus Stuttgart ;) At the opposite edge of Germany, in the northeast instead of the southwest. To be more specific: Rostock. |
Send message Joined: 6 Oct 09 Posts: 39 Credit: 78,881,405 RAC: 0 |
Is there any work around to prevent this? I'm seeing the same with my SLI 260's and have had to switch to a project with longer WUs whenever I am using my PC. |
Send message Joined: 19 May 09 Posts: 30 Credit: 1,062,540 RAC: 0 |
I lived in Neu Ulm for 2 years. What a beautiful country ! ( Deutschland ) Cheers Bill |
Send message Joined: 5 Oct 09 Posts: 22 Credit: 22,661,352 RAC: 0 |
Where in Germany do you live, by the way? Ich bin aus Stuttgart ;) Ok, I just updated the 5870 rig to 0.20b and set it to 0,5 GPUs - now it's crunching 2 WUs simultaneously without the drops to 50%. BUT: I only see ~90% GPU load instead of the 99% I had previously. I will try setting w0.9 to see if this improves the load BTW, hast Du ICQ oder so? Wenn ja kannst mich ja mal anschreiben, 170279477 ;) |
Send message Joined: 5 Oct 09 Posts: 22 Credit: 22,661,352 RAC: 0 |
Just reverted back to 0.20 because 0.20b only gives me ~90% GPU load (no matter what w or p settings I try or how many WUs I run concurrently) while 0.20 gives me 98-99% at least. Only way to get to 99% on 0.20b is pause the CPU WUs, which is not an option of course. Ninja Edit: Running 2 WUs on 0.20 works way better. I am now seeing GPU usage between 85 and 99%, no more dropdowns to 50%. But something is definitely off with 0.20b (for me, at least) |
Send message Joined: 9 Feb 09 Posts: 166 Credit: 27,520,813 RAC: 0 |
3. Credit system. The PPD Milkyway is claiming are extremely inflated and WAY over the top. I agree on you Ice people running it on NORMAL cpu suffer and the recent lowering did hurt my rac much ;) Ofcourse i am not one of the top dogs in DC country but still, all who have super cpu's and/or run on super GPU's have a nice output. But to state the project is overinflated .... pff we should get 10x more points when we get now :D Just my TWO cents for D.C. gpu's are the future Its new, its relative fast... my new bicycle |
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0 |
Just reverted back to 0.20 because 0.20b only gives me ~90% GPU load (no matter what w or p settings I try or how many WUs I run concurrently) while 0.20 gives me 98-99% at least. Yes, it appears on some systems the app is behaving a bit differently than on my test rigs (WinXP32 and WinXP64). But I think you can force 0.20b to the same behaviour as 0.20 by setting the new parameter b-1 (b minus one). |
©2024 Astroinformatics Group