GPU app teaser

Author	Message
L@MiR Send message Joined: 21 Jan 09 Posts: 5 Credit: 14,572,804 RAC: 0	Message 9394 - Posted: 30 Jan 2009, 2:20:27 UTC Last modified: 30 Jan 2009, 2:28:41 UTC No, it's here: http://www.planet3dnow.de/vbulletin/showpost.php?p=3850506&postcount=762 click: "die neue Version (V3)" And report (Feedback) in this thread http://www.planet3dnow.de/vbulletin/showthread.php?t=353616 please ;) (also in English) ID: 9394 · Rating: 0 · rate: / Reply Quote

jedirock Send message Joined: 8 Nov 08 Posts: 178 Credit: 6,140,854 RAC: 0	Message 9398 - Posted: 30 Jan 2009, 2:28:11 UTC - in response to Message 9394. No, it's here: http://www.planet3dnow.de/vbulletin/showpost.php?p=3850506&postcount=762 click: "die neue Version (V3)" And report in this thread http://www.planet3dnow.de/vbulletin/showthread.php?t=353616 please ;) (also in English) Heh, that was quick. And of course, I can't actually use the application until Saturday... I'll keep an eye on this thread, and start testing as soon as I can. ID: 9398 · Rating: 0 · rate: / Reply Quote

L@MiR Send message Joined: 21 Jan 09 Posts: 5 Credit: 14,572,804 RAC: 0	Message 9400 - Posted: 30 Jan 2009, 2:33:16 UTC - in response to Message 9398. Last modified: 30 Jan 2009, 2:35:47 UTC ... Heh, that was quick. .. Pure coincidence. Good Night... @03:32 ... in Germany... ID: 9400 · Rating: 0 · rate: / Reply Quote

Cluster Physik Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0	Message 9401 - Posted: 30 Jan 2009, 2:56:33 UTC - in response to Message 9380. Last modified: 30 Jan 2009, 3:00:00 UTC Is there any way to get the application now, or is it just open to Planet 3DNow members? If you browse our forum, you will find it. It's not in a somehow closed forum where you have to login or even have to be member of the team. You will find it also as a guest and now even easier as Emploi posted the link ;) But it would be helpful if you could read some German, as there are instructions in that thread (you may need to edit the included app_info.xml). It is still a test version (think of it as an alpha version or technology demonstrator, it not a release candidate yet) and some things are simply not working in the moment. I'm in contact with Travis about it as well. My plan is to give the app including the code to the project when it is working as desired and can be distributed as the stock GPU application of Milkyway. And if you are interested in your credit standings, you better not run this app with the current credit and WU limits. A quad core with a HD4870 gets only about 330 credits a day, a dual core only 165 credits or so (as the number of WUs you get scales with the number of your cores, even if the calculation speed does not depend on them). It is just the pure enthusiasm that let one lone guy run that app, not the credits. Furthermore we simply need to test it, so I'm really thankful for that (as I don't have a compatible card). ID: 9401 · Rating: 0 · rate: / Reply Quote

jedirock Send message Joined: 8 Nov 08 Posts: 178 Credit: 6,140,854 RAC: 0	Message 9404 - Posted: 30 Jan 2009, 3:42:25 UTC - in response to Message 9401. Is there any way to get the application now, or is it just open to Planet 3DNow members? If you browse our forum, you will find it. It's not in a somehow closed forum where you have to login or even have to be member of the team. You will find it also as a guest and now even easier as Emploi posted the link ;) But it would be helpful if you could read some German, as there are instructions in that thread (you may need to edit the included app_info.xml). It is still a test version (think of it as an alpha version or technology demonstrator, it not a release candidate yet) and some things are simply not working in the moment. I'm in contact with Travis about it as well. My plan is to give the app including the code to the project when it is working as desired and can be distributed as the stock GPU application of Milkyway. And if you are interested in your credit standings, you better not run this app with the current credit and WU limits. A quad core with a HD4870 gets only about 330 credits a day, a dual core only 165 credits or so (as the number of WUs you get scales with the number of your cores, even if the calculation speed does not depend on them). It is just the pure enthusiasm that let one lone guy run that app, not the credits. Furthermore we simply need to test it, so I'm really thankful for that (as I don't have a compatible card). Understood. I know it's just for testing, but it also helps to heat up the room here. :-P The only thing is I don't speak any German, but I've tweaked plenty of app_info.xml files before, so I can get it running. Is it fine if I were to just report anything I find here, or should I still get an account on Planet 3DNow's forums? ID: 9404 · Rating: 0 · rate: / Reply Quote

Cluster Physik Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0	Message 9408 - Posted: 30 Jan 2009, 6:14:51 UTC - in response to Message 9404. Last modified: 30 Jan 2009, 6:18:03 UTC Understood. I know it's just for testing, but it also helps to heat up the room here. :-P The only thing is I don't speak any German, but I've tweaked plenty of app_info.xml files before, so I can get it running. Is it fine if I were to just report anything I find here, or should I still get an account on Planet 3DNow's forums? I guess it does not matter too much, you can also ask here. For starters, the content of the app_info.xml does not only decide if it runs or not, but also how it runs. With the values avg_ncpus and max_ncpus and the ncpu value of your cc_config.xml you can control (actually together with the resource share, if you run other projects at the same time) how much MW-WUs run concurrently. It will use one core, no matter how many WUs are calculated in parallel. Running two at once could give a slightly better throughput (think about saving 0.2 seconds per WU or so), running more could give diminishing returns. Especially if you set the avg_ncpus value very low, it may become slower. I would suggest setting the ncpu value of your cc_config.xml to your actual numbers of cores +1, set avg_ncpu and max_ncpu both to one and then chose the resource share of milkyway that way, that 2 WUs run concurrently. Example: Quadcore, ncpu in cc_config.xml set to 5 (calculates then 5 WUs in total parallel) and MW resource share to 2/5 = 40%. Milkyway will use then effectively one core and three are left for other things and it calculates 2 MW-WUs in parallel. You can also test a very low resource share for MW and setting the avg_ncpus in the app_info.xml to a low value (like 0.25). The slower the card, the lower you can set this before loosing too much performance. 0.2 should be the absolute minimum for a HD4870, a HD3850 may run also with 0.1 quite well. These values get really interesting if/when I release the CPU during the computation. ID: 9408 · Rating: 0 · rate: / Reply Quote

jedirock Send message Joined: 8 Nov 08 Posts: 178 Credit: 6,140,854 RAC: 0	Message 9417 - Posted: 30 Jan 2009, 14:29:46 UTC - in response to Message 9408. Understood. I know it's just for testing, but it also helps to heat up the room here. :-P The only thing is I don't speak any German, but I've tweaked plenty of app_info.xml files before, so I can get it running. Is it fine if I were to just report anything I find here, or should I still get an account on Planet 3DNow's forums? I guess it does not matter too much, you can also ask here. For starters, the content of the app_info.xml does not only decide if it runs or not, but also how it runs. With the values avg_ncpus and max_ncpus and the ncpu value of your cc_config.xml you can control (actually together with the resource share, if you run other projects at the same time) how much MW-WUs run concurrently. It will use one core, no matter how many WUs are calculated in parallel. Running two at once could give a slightly better throughput (think about saving 0.2 seconds per WU or so), running more could give diminishing returns. Especially if you set the avg_ncpus value very low, it may become slower. I would suggest setting the ncpu value of your cc_config.xml to your actual numbers of cores +1, set avg_ncpu and max_ncpu both to one and then chose the resource share of milkyway that way, that 2 WUs run concurrently. Example: Quadcore, ncpu in cc_config.xml set to 5 (calculates then 5 WUs in total parallel) and MW resource share to 2/5 = 40%. Milkyway will use then effectively one core and three are left for other things and it calculates 2 MW-WUs in parallel. You can also test a very low resource share for MW and setting the avg_ncpus in the app_info.xml to a low value (like 0.25). The slower the card, the lower you can set this before loosing too much performance. 0.2 should be the absolute minimum for a HD4870, a HD3850 may run also with 0.1 quite well. These values get really interesting if/when I release the CPU during the computation. All right, sounds cool. Why I can't run the application right now is I blew a fuse in the power supply (not from overcurrent, it was stupidity on my part), and I'll have a new one for Saturday (tomorrow). It is indeed a quad-core: an overclocked Q6600 to be exact. So I'll try the values you suggested once that's back up. ID: 9417 · Rating: 0 · rate: / Reply Quote

Honza Send message Joined: 28 Aug 07 Posts: 31 Credit: 86,152,236 RAC: 0	Message 9432 - Posted: 31 Jan 2009, 10:34:20 UTC Tested GPU app on HD3870. Performance is far superior to any CPU, no doubt. I've notice that some WUs are "not compatible, falling back to a somewhat slow CPU code." For example nm_s79, nm_s86 are run on GPU but nm_s20 or nm_s21 are run on CPU. It is a known issue or some types of WUs can't be completed on a GPU at all? BOINC Project specifications and hardware requirements ID: 9432 · Rating: 0 · rate: / Reply Quote

Cluster Physik Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0	Message 9434 - Posted: 31 Jan 2009, 11:40:17 UTC - in response to Message 9432. Last modified: 31 Jan 2009, 12:10:23 UTC Tested GPU app on HD3870. Performance is far superior to any CPU, no doubt. I've notice that some WUs are "not compatible, falling back to a somewhat slow CPU code." For example nm_s79, nm_s86 are run on GPU but nm_s20 or nm_s21 are run on CPU. It is a known issue or some types of WUs can't be completed on a GPU at all? That's what I'm working on. I've already said some things are simply not working, yet. Do you remember the news about the new WUs with two streams in it? They can't be calculated on the GPU in the moment. The flop counting for this type of WUs is also wrong. If I find some time I will extend the GPU code a bit over the weekend. But as you have seen, nothing terrible happens, it just gets slower when the app falls back to the CPU. I'm quite happy that it works like intended. I had no idea that a new type of WU would be introduced, so I'm glad I prepared for that ;) Could you please post a link to your machine (or one or two results) with a short description of your settings (resurce share of MW, changes to the app_info.xml or cc_config.xml)? It should take a bit less than 30 seconds per WU on a 3870, right? Edit: Ahh, found it here. 25 seconds for a 3870 is okay. The difference to the 9.x seconds of the HD4870 corresponds roughly to the difference in the number of stream processors for both GPUs. The shader power was raised a factor of 2.5 between the generations (besides some other tweaks). Have you played around a bit with the settings in the app_info.xml, cc_config.xml and the resource share? By the way, it is not a fair comparison, but a HD4870 here at Milkyway is doing more double precision operations per second than a GTX280 is doing single precision operations with the CUDA application of SETI. Edit2: I see you have updated the client from 5.10.30 to 6.6.3 before running the GPU app. This should have been unnecessary as the GPU stuff in there is only for nvidia cards and is not used at all. ID: 9434 · Rating: 0 · rate: / Reply Quote

Honza Send message Joined: 28 Aug 07 Posts: 31 Credit: 86,152,236 RAC: 0	Message 9436 - Posted: 31 Jan 2009, 12:32:29 UTC - in response to Message 9434. Have you played around a bit with the settings in the app_info.xml, cc_config.xml and the resource share? I see you have updated the client from 5.10.30 to 6.6.3 before running the GPU app. This should have been unnecessary as the GPU stuff in there is only for nvidia cards and is not used at all. Thanks for your comments. Yes, I've played with app_info.xml, cc_config.xml. Since there is no way to split computing to 4 CPU tasks and n GPU tasks (none that I'm aware of), I have settled to ncpus=5 and put both avg_ncpus and max_ncpus to 1. This way GPU is fequently idle so setting avg_ncpus=0.5 will make better change for GPU to take action but may slow down CPU dealing with too many tasks. No exact measurement in CPU performance was done when Q9550 is doing 4 or 8 tasks at once. Once your GPU app is ready to other WU types, it would make more sense to play with settings and resource share in order to make GPU doing MW and let CPUs on other projects. I'm using rather outdated 5.10.30 BOINC Studio core since it has backup project(s) ability. It is still left on my SSD with original projects configuration. There is nothing really important/interesting in 6.x to upgrade dozens of hosts (no multithreading project apps available for example) and GPU support is far from bug free. 6.6.3 is a fresh install with only MW attached to play with. I may revert back to BS 5.10.30 completely. BOINC Project specifications and hardware requirements ID: 9436 · Rating: 0 · rate: / Reply Quote

Emanuel Send message Joined: 18 Nov 07 Posts: 280 Credit: 2,442,757 RAC: 0	Message 9443 - Posted: 31 Jan 2009, 16:21:15 UTC Last modified: 31 Jan 2009, 16:22:16 UTC Thank you for all your work on this. I must say it's good to hear AMD/ATI's cards are performing so well now, considering they've been playing catchup for a while (from what I can gather, anyway). Is the code very different from the CUDA equivalent? And would a CUDA conversion be worth it? (I heard the GTX280 is much better at double precision operations than earlier cards, but it sounds like even it is outmatched by current offerings by AMD) ID: 9443 · Rating: 0 · rate: / Reply Quote

Paul D. Buck Send message Joined: 12 Apr 08 Posts: 621 Credit: 161,934,067 RAC: 0	Message 9445 - Posted: 31 Jan 2009, 16:50:42 UTC - in response to Message 9443. Thank you for all your work on this. I must say it's good to hear AMD/ATI's cards are performing so well now, considering they've been playing catchup for a while (from what I can gather, anyway). Is the code very different from the CUDA equivalent? And would a CUDA conversion be worth it? (I heard the GTX280 is much better at double precision operations than earlier cards, but it sounds like even it is outmatched by current offerings by AMD) Like the Intel / AMD wars the ATI and Nvidia wars have first one then the other out in front... I am sure that if this application gets going here demonstrating the capabilities of the ATI cards that Nvidia will notice and respond in the next generation. And if it does prove out I can still get a couple of ATI cards to go in two of my systems though I will be Nvida heavy for a bit ... But first, we have to have the application ... :) ID: 9445 · Rating: 0 · rate: / Reply Quote

Cluster Physik Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0	Message 9465 - Posted: 31 Jan 2009, 22:41:10 UTC - in response to Message 9443. Is the code very different from the CUDA equivalent? And would a CUDA conversion be worth it? (I heard the GTX280 is much better at double precision operations than earlier cards, but it sounds like even it is outmatched by current offerings by AMD) No, the general programming principle is quite similar for ATI and nvidia. There are of course some differences how to obtain maximum performance, but luckily some of them apply only to single precision (ATI cards like vectorization, which is not required for nvidia). I really think a CUDA app would be easier to implement (their software developement kit is simply better), but with a GTX280 not able to reach the performance of a HD3850 in double precision, I don't know if it is really worth the effort. The current generation ATI hardware (RV770) has a factor 3 performance advantage over the GT200 GPUs from nvidia. nvidia made a lot of fuss about the double precision units in the GT200 (GTX2xx cards), but frankly they are quite a design flaw from the performance point of view (at least when doing pure double precision calculations). They have 240 high clocked single precision units, but only 30 double precision units (which can do even less per clock than the single units). The result is that the performance with doubles is only 1/12 of the performance with singles. On the other hand ATI has incorporated 160 5-issue VLIW units (doing up to five operations on singles). If you want to calculate with doubles, either two or four of the five 32bit subunits are combined. That means such a VLIW unit is able to produces one or two double results per clock (the 5th subunit can still be used for other things). So effectively a RV770 is able to churn out between 160 and 320 double results per clock cycle (dividing the single throughput by 5, for adds actually only by 2.5), nvidia is only able to do 30. The higher clock of the nvidia shaders won't help with the massive advantage ATI has on the number of double capable units as they use the same ones for singles and doubles. The GPU part of the MW code does close to 150 GFlop per second on a HD4870. Averaged over the whole runtime (a little bit is still calculated on the CPU and you have some overhead, like transfering data to the GPU and so on) it is more than 130 GFlop/s. The theoretical peak performance of a GTX280 is only 78 GFlop/s with doubles. I wouldn't expect more than 50 GFlop/s from a GTX280 on the MW code. So maybe a triple SLI system is as fast as a single HD4870. And a high clocked Core i7 also does already about 35 GFlop/s (~61 GFlop/s Peak at 3.8GHz). ID: 9465 · Rating: 0 · rate: / Reply Quote

jedirock Send message Joined: 8 Nov 08 Posts: 178 Credit: 6,140,854 RAC: 0	Message 9504 - Posted: 1 Feb 2009, 7:05:13 UTC Got some errors with the workunits. I made the mistake of setting the app in, then putting Milkyway wide open. About 7 tasks quit with an error before I suspended the rest. Looking in one of the reported WUs, the exit code is 0xc0000135. Googling for that returns many results for BOINC, most of which seem to say it's a missing DLL. I'm presuming this to be brook.dll. So maybe the app_info has to be tweaked so it's also copied to the slots directory? I'm not sure how to verify what files are in there to check, as BOINC deletes them too quickly for me. ID: 9504 · Rating: 0 · rate: / Reply Quote

fubared Send message Joined: 8 Apr 08 Posts: 2 Credit: 1,035,343 RAC: 0	Message 9510 - Posted: 1 Feb 2009, 9:25:39 UTC Quick notes: * ~20s GPU crunch times on i920@3.5 + 3850/256. * You can feel the whole machine lagging when the GPU fires up lol. (think the problem is noted a few messages ago) * ~450s CPU runtime? I hope there's a way of disabling the GPU part so I can run this on my other x64 box with Nvidia card... ID: 9510 · Rating: 0 · rate: / Reply Quote

Cluster Physik Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0	Message 9513 - Posted: 1 Feb 2009, 11:31:22 UTC - in response to Message 9504. Got some errors with the workunits. I made the mistake of setting the app in, then putting Milkyway wide open. About 7 tasks quit with an error before I suspended the rest. Looking in one of the reported WUs, the exit code is 0xc0000135. Googling for that returns many results for BOINC, most of which seem to say it's a missing DLL. I'm presuming this to be brook.dll. So maybe the app_info has to be tweaked so it's also copied to the slots directory? I'm not sure how to verify what files are in there to check, as BOINC deletes them too quickly for me. Hmm, what have you downloaded? The brook.dll is supplied in the zip file and also correctly set up in the also supplied app_info.xml. Just copy all 3 files to your Milkyway folder (and completely quit BOINC before). But maybe you need to download new WUs, as for the ones you already have there could be some issues with the data in the client_state.xml. @ the guys, where it runs, had you similiar problems? ID: 9513 · Rating: 0 · rate: / Reply Quote

Cluster Physik Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0	Message 9514 - Posted: 1 Feb 2009, 11:41:13 UTC - in response to Message 9510. Quick notes: * ~20s GPU crunch times on i920@3.5 + 3850/256. * You can feel the whole machine lagging when the GPU fires up lol. (think the problem is noted a few messages ago) * ~450s CPU runtime? I hope there's a way of disabling the GPU part so I can run this on my other x64 box with Nvidia card... The lagging gets better with faster cards. And don't start GPU-Z! At least under Vista64 it leads to short (two second) freezes and even some crashed WUs. Have no idea why. The CPU part is just the failsafe backup solution for the longer two-stream WUs, so it isn't the fastest version one could run ;) ID: 9514 · Rating: 0 · rate: / Reply Quote

Glenn Rogers Send message Joined: 4 Jul 08 Posts: 165 Credit: 364,966 RAC: 0	Message 9515 - Posted: 1 Feb 2009, 12:10:08 UTC - in response to Message 9514. Last modified: 1 Feb 2009, 12:14:23 UTC So where is the link to the zip file?? Sorry in reply to Message 9513 ID: 9515 · Rating: 0 · rate: / Reply Quote

sandro Send message Joined: 17 Oct 08 Posts: 16 Credit: 16,783 RAC: 0	Message 9517 - Posted: 1 Feb 2009, 13:50:43 UTC - in response to Message 9515. So where is the link to the zip file?? Sorry in reply to Message 9513 here http://www.file-upload.net/download-1414247/Milkyway_0.16_GPU_SSE3_x64.zip.html ID: 9517 · Rating: 0 · rate: / Reply Quote

Cluster Physik Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0	Message 9518 - Posted: 1 Feb 2009, 13:51:30 UTC - in response to Message 9515. So where is the link to the zip file?? Sorry in reply to Message 9513 Reading the thread helps, it's here. And remember, Win64 only, ATI HD38x0 or HD48x0 with Cat 8.12 or Cat 9.1 required. Application is in some kind of alpha state, expect some bugs, tweaking of app_info.xml and cc_config.xml may be required for optimal performance. And don't be disappointed by the credits ;) ID: 9518 · Rating: 0 · rate: / Reply Quote