CUDA for Milkyway@Home

Author	Message
Glenn Rogers Send message Joined: 4 Jul 08 Posts: 165 Credit: 364,966 RAC: 0	Message 25245 - Posted: 13 Jun 2009, 12:22:36 UTC - in response to Message 25168. Thanks for the info not such an easy task it would appear...... ID: 25245 · Rating: 0 · rate: / Reply Quote

SATAN Send message Joined: 27 Feb 09 Posts: 45 Credit: 305,963 RAC: 0	Message 25251 - Posted: 13 Jun 2009, 13:04:43 UTC Glenn, it may well be very easy. It's been almost 10 years since I did anything related to Unix commands. So getting CUDA itself to work is probably far easier than I found it. I found an idiots guide. I take my hate of to Travis and CP for building any GPU app. Mars rules this confectionery war! ID: 25251 · Rating: 0 · rate: / Reply Quote

Cluster Physik Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0	Message 25264 - Posted: 13 Jun 2009, 15:37:04 UTC - in response to Message 25243. Well i've finally managed to get CUDA working properly on the Mac Pro. Not bad considering it's a slow old 8800GT. Do you have any performance figures to share? trisf told us a 9600GT on a C2D 6750 took about 15 minutes for the wedge 20 test unit. These test WUs are quite small so the execution time may be somehow limited by the CPU and all the calling overhead for the GPU stuff. Nevertheless it would be interesting to have a comparison with the 8800GT. ID: 25264 · Rating: 0 · rate: / Reply Quote

trisf Send message Joined: 30 Nov 08 Posts: 11 Credit: 25,658 RAC: 0	Message 25275 - Posted: 13 Jun 2009, 17:34:42 UTC I tried to run ps_sgr_214F_2s* on my 9600gt and self compiled linux64 binary... 1) insane desktop performance slowdown 2) after running 3hours i have to kill it 3) CPU load 100% ID: 25275 · Rating: 0 · rate: / Reply Quote

Glenn Rogers Send message Joined: 4 Jul 08 Posts: 165 Credit: 364,966 RAC: 0	Message 25284 - Posted: 13 Jun 2009, 18:42:53 UTC - in response to Message 25251. Gday Satan, I dont have any code writing experience or i would have a go at it myself and my ATI X1300 only handles single precision so it looks like i have to upgrade my graphics card...May have go trolling for some info on what my card is actually capable of... Absolutely hats off to Cluster and Travis they have done an outstanding job getting the app up and running....... Glenn ID: 25284 · Rating: 0 · rate: / Reply Quote

Cluster Physik Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0	Message 25294 - Posted: 13 Jun 2009, 19:25:14 UTC - in response to Message 25275. I tried to run ps_sgr_214F_2s* on my 9600gt and self compiled linux64 binary... 1) insane desktop performance slowdown 2) after running 3hours i have to kill it 3) CPU load 100% Yes, the production WUs are quite bit larger than the test WUs. As the code for MW_GPU does quite a bit more with one WU as the legacy MW@home code (roughly 300 or 400 times as much for the WU you tried to run, would have to check it to give an exact number), it is normal for them to take several hours. The fastest GPUs out there complete these WUs in about 50 seconds with the "classic" algorithm, albeit in double precision. Multiplying that time with 400 equals 5.5 hours. Such long WU were one of the goals of MW_GPU actually. That slow and sluggish behaviour of the GUI is a side affect of GPU apps with a very high utilization of the GPU. The ATI app also suffered (and still does to some extent) from this. One has to limit the duration of the GPU kernels somehow. That creates short opportunities for other tasks (like the screen refresh) to execute which will result in a smoother experience. The high CPU load should be easy to cure. One only have to send the application to sleep (a millisecond is enough) when it busy waits for the completion of a GPU kernel. That should be one line in the code (at least I hope so). ID: 25294 · Rating: 0 · rate: / Reply Quote

SATAN Send message Joined: 27 Feb 09 Posts: 45 Credit: 305,963 RAC: 0	Message 25325 - Posted: 13 Jun 2009, 22:04:10 UTC Cluster, I haven't dared mess with the Milkyway stuff, Gave me a big enough head ache just making sure CUDA was installed correctly. Will have a go over the next couple of days. I keep screwing something up because it keeps telling me that not target has been set. Will need to go through take a slow careful look at what i'm screwing up. I doubt i'll notice a slowdown with the desktop though as I run the 8800 purely on its own without a monitor connected. Will post back if/when I finally get the damn thing working properly. Mars rules this confectionery war! ID: 25325 · Rating: 0 · rate: / Reply Quote

Cluster Physik Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0	Message 25329 - Posted: 13 Jun 2009, 22:39:22 UTC - in response to Message 25325. Will have a go over the next couple of days. I keep screwing something up because it keeps telling me that not target has been set. Will need to go through take a slow careful look at what i'm screwing up. I doubt i'll notice a slowdown with the desktop though as I run the 8800 purely on its own without a monitor connected. That could be the problem. Don't know how it works on a Mac, but under Win and Linux you have to attach a monitor to the card. Otherwise it is not active and one can't run anything on the GPU. ID: 25329 · Rating: 0 · rate: / Reply Quote

SATAN Send message Joined: 27 Feb 09 Posts: 45 Credit: 305,963 RAC: 0	Message 25374 - Posted: 14 Jun 2009, 6:23:46 UTC Last modified: 14 Jun 2009, 6:50:39 UTC I had no trouble getting it to run under BootCamp without a display connected. I don't know whether it is something in the Apple drivers or not, but I can run the CUDA examlples such as oceanFFT no problems and they show perfectly fine. Arkayn might have a better idea of why it works. [img=http://img44.imageshack.us/img44/7240/cudascreenshot.th.png] Mars rules this confectionery war! ID: 25374 · Rating: 0 · rate: / Reply Quote

Emanuel Send message Joined: 18 Nov 07 Posts: 280 Credit: 2,442,757 RAC: 0	Message 25375 - Posted: 14 Jun 2009, 7:08:22 UTC According to Nvidia the requirement of having to attach a monitor is a strange Microsoft requirement that they could work around - but not without breaking WHQL certification. I don't know the deal with Linux though. ID: 25375 · Rating: 0 · rate: / Reply Quote

arkayn Send message Joined: 14 Feb 09 Posts: 999 Credit: 74,932,619 RAC: 0	Message 25376 - Posted: 14 Jun 2009, 7:12:16 UTC - in response to Message 25374. I had no trouble getting it to run under BootCamp without a display connected. I don't know whether it is something in the Apple drivers or not, but I can run the CUDA examlples such as oceanFFT no problems and they show perfectly fine. Arkayn might have a better idea of why it works. [img=http://img44.imageshack.us/img44/7240/cudascreenshot.th.png] Not really, I don't know hardly anything about software/driver developing. I am pretty good on app_info's up to when they added all that fplops to the mix. ID: 25376 · Rating: 0 · rate: / Reply Quote

verstapp Send message Joined: 26 Jan 09 Posts: 589 Credit: 497,834,261 RAC: 0	Message 25380 - Posted: 14 Jun 2009, 8:15:50 UTC Last modified: 14 Jun 2009, 8:18:15 UTC Or even... Though you may have to shrink the image to make it fit. Not all of us have wide screens. :) Cheers, PeterV . ID: 25380 · Rating: 0 · rate: / Reply Quote

borandi Send message Joined: 21 Feb 09 Posts: 180 Credit: 27,806,824 RAC: 0	Message 25407 - Posted: 14 Jun 2009, 12:05:13 UTC There is a way around the monitor bug thing in windows without using a second monitor or a dummy plug. Go to your display settings, enable the second monitor as an extention of your desktop, AND as the primary monitor. When you click apply, you'll be left with a screen which is just your background. Now unplug the monitor cable from it's current graphics card, into the one you just enabled. You should be back to your desktop, albeit able to move your mouse off to the left. This enables both cards. The one drawback is that sometimes (not often) windows will pop up on the other screen - I had it with my MSN messenger, until I dragged the window over and then it was fine. ID: 25407 · Rating: 0 · rate: / Reply Quote

Travis Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0	Message 25537 - Posted: 15 Jun 2009, 15:38:40 UTC - in response to Message 25329. Will have a go over the next couple of days. I keep screwing something up because it keeps telling me that not target has been set. Will need to go through take a slow careful look at what i'm screwing up. I doubt i'll notice a slowdown with the desktop though as I run the 8800 purely on its own without a monitor connected. That could be the problem. Don't know how it works on a Mac, but under Win and Linux you have to attach a monitor to the card. Otherwise it is not active and one can't run anything on the GPU. On the new macbook pros, you need to go into system preferences -> energy saver then select higher performance to use the other (faster) GPU. If you don't want to use that there's a line in evaluation_gpuX.cu which sets the device (it's at 1, i think it should be changed to 0 to use the on-chip GPU). ID: 25537 · Rating: 0 · rate: / Reply Quote

trisf Send message Joined: 30 Nov 08 Posts: 11 Credit: 25,658 RAC: 0	Message 25827 - Posted: 17 Jun 2009, 19:53:44 UTC Trying to obtain results for linux_x86_64 cuda gpu http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=81905297 after ~4hours got this out result ps_sgr_214F5_2s_hiw_470211_1245248961_0_0 hessian [14 x 14]: 2.28519259071191482846 -0.39497154621000629682 -3.55474399915678374029 0.74480348740320800882 -0.10156406271555340481 -3.85943627057017080162 -0.81251291805806147295 -5.58602580857936370506 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 -0.39497154621000629682 0.00175542913538606626 0.63195448873898374398 -0.05642450621960675566 -0.01579886098489345983 -0.01579886098489345983 -0.04513960563359534217 0.67709408450393004930 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 -3.55474399915678374029 0.63195448873898374398 -12.18769307698152992714 0.94793169610104166534 -5.28133370369943122569 5.68759017660624976997 7.10948777626896344373 -48.75077341814914433371 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.74480348740320800882 -0.05642450621960675566 0.94793169610104166534 0.11736297489406412320 -0.31146327739151052905 -0.14896069933101330207 0.15573163869575526452 1.62502578060497171464 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 -0.10156406271555340481 -0.01579886098489345983 -5.28133370369943122569 -0.31146327739151052905 0.02031282919645605034 0.08125129458136370886 0.12187693076981531703 1.42189759966271367375 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 -3.85943627057017080162 -0.01579886098489345983 5.68759017660624976997 -0.14896069933101330207 0.08125129458136370886 3.14848748184104465508 -0.28437951993254273475 2.64066690736086684410 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 -0.81251291805806147295 -0.04513960563359534217 7.10948777626896344373 0.15573163869575526452 0.12187693076981531703 -0.28437951993254273475 -0.10156411267558951295 3.65630814513906399199 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 -5.58602580857936370506 0.67709408450393004930 -48.75077341814914433371 1.62502578060497171464 1.42189759966271367375 2.64066690736086684410 3.65630814513906399199 16.25025891627273821882 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 gradient[14]: -0.47922010618095528534, 0.00001805584225343814, -0.12744264776820557472, -0.00211388771820253396, 0.00110095497385387375, -0.16134474836393408737, -0.00103189137901082972, 0.00658135446141017069, 0.00000000000000000000, 0.00000000000000000000, 0.00000000000000000000, 0.00000000000000000000, 0.00000000000000000000, 0.00000000000000000000 initial_fitness: -3.12187372012968600288 inital_parameters[14]: 0.30045917696320600943, 30.00000000000000000000, -0.18211869995751800433, 162.48457188946628093618, 13.45735709916518807461, 6.27418922735829376336, 6.28318530717958623200, 8.36662928454264331890, -19.55595416138490350022, 218.02428246770318764902, 7.94448075244863360922, 5.46189868967828839885, 0.00000000000000000000, 18.87596924383203855768 result_fitness: 0.00000000000000000000 result_parameters[14]: 0.00000000000000000000, 0.00000000000000000000, 0.00000000000000000000, 0.00000000000000000000, 0.00000000000000000000, 0.00000000000000000000, 0.00000000000000000000, 0.00000000000000000000, 0.00000000000000000000, 0.00000000000000000000, 0.00000000000000000000, 0.00000000000000000000, 0.00000000000000000000, 0.00000000000000000000 number_evaluations: 447 metadata: p: 27, v: 0.00005843793239637506 25.03408585491482796215 -0.01973286613357989189 9.72341257218874055468 0.99125759582857653207 0.07406591432380578433 5.64256730759030311617 0.22236400978163128883 0.02643427419835397973 -1.21804191646168202823 -12.80871199567133800201 -0.82128661750129838826 -3.16154559902874643385 2.90070763055914682127 and some stderr.txt APP: error reading hessian checkpoint file (for read): data_file == NULL shmget in attach_shmem: Invalid argument Can't set up shared mem: -1 Will run in standalone mode. APP: error reading hessian checkpoint file (for read): data_file == NULL APP: error reading hessian checkpoint file (for write): data_file == NULL called boinc_finish shmget in attach_shmem: Invalid argument Can't set up shared mem: -1 Will run in standalone mode. wu is still runnning ID: 25827 · Rating: 0 · rate: / Reply Quote

SATAN Send message Joined: 27 Feb 09 Posts: 45 Credit: 305,963 RAC: 0	Message 25832 - Posted: 17 Jun 2009, 21:10:51 UTC I'll give it ago when Travis posts the updated code files. I can't say that i will have any success though. Mars rules this confectionery war! ID: 25832 · Rating: 0 · rate: / Reply Quote

Cluster Physik Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0	Message 25894 - Posted: 18 Jun 2009, 10:11:25 UTC - in response to Message 25827. Trying to obtain results for linux_x86_64 cuda gpu http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=81905297 after ~4hours got this out result ps_sgr_214F5_2s_hiw_470211_1245248961_0_0 [..] number_evaluations: 447 [..] As this is a 2 stream WU with a double sized wedge, I can give some comparison with a a HD3870 (overclocked to 860MHz). I've not run a whole WU yet (takes too long ;)), but I know the time for a single evaluation. As the the number of evaluations is given in the output file, I can say that HD3870 would take about 8000 seconds (2:15 hours) for the 447 evaluations (roughly 18 seconds per evaluation). This number is deduced from a normal sized wedge, so there is some uncertainty to it (maybe 20%). What graphics card do you use? Was it a 9600GT? It has 64 stream processors, opposed to the 112 to 128 of the 8800GT/GTX, 9800GT/GTX series. That would mean a G92 based graphics card is roughly as fast as a HD3870 with the current code, depending on the clock and the exact number of enabled units also a bit faster. The GT200 would battle it out with the HD4800 series then ;) ID: 25894 · Rating: 0 · rate: / Reply Quote

trisf Send message Joined: 30 Nov 08 Posts: 11 Credit: 25,658 RAC: 0	Message 25921 - Posted: 18 Jun 2009, 15:15:26 UTC Thanks CP. yes it was 9600gt strange behavior: when you stop project wus dont stop and continues to run only kill boinc helps ID: 25921 · Rating: 0 · rate: / Reply Quote

Glenn Rogers Send message Joined: 4 Jul 08 Posts: 165 Credit: 364,966 RAC: 0	Message 25942 - Posted: 18 Jun 2009, 18:14:43 UTC - in response to Message 25921. In the BOINC manager options menu check the enable manager exit menu check box then ok. Then file exit..the dialog box should have the checkbox stop science applications when exiting manager make sure this is checked click ok. That should be it ID: 25942 · Rating: 0 · rate: / Reply Quote

Cluster Physik Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0	Message 25954 - Posted: 18 Jun 2009, 20:23:37 UTC - in response to Message 25827. Last modified: 18 Jun 2009, 20:31:36 UTC Trying to obtain results for linux_x86_64 cuda gpu http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=81905297 after ~4hours got this out result ps_sgr_214F5_2s_hiw_470211_1245248961_0_0 By the way, there may be a bug in the CUDA version when initializing the stream_c parameters. If one compares the init_constants function from the CPU version if (ap->sgr_coordinates == 0) { atGCToEq(ap->stream_parameters[i][0], 0, &ra, &dec, get_node(), wedge_incl(ap->wedge)); atEqToGal(ra, dec, &l, &b); } else if (ap->sgr_coordinates == 1) { gcToSgr(ap->stream_parameters[i][0], 0, ap->wedge, &lamda, &beta); //vickej2 sgrToGal(lamda, beta, &l, &b); //vickej2 } else { printf("Error: sgr_coordinates not valid"); } lbr[0] = l; lbr[1] = b; lbr[2] = ap->stream_parameters[i][1]; lbr2xyz(lbr, stream_c[i]); with the beginning of gpu__likelihood gc_to_gal(wedge, stream_parameters(i,0) * D_DEG2RAD, 0 * D_DEG2RAD, &(lbr[0]), &(lbr[1])); lbr[2] = stream_parameters(i,1); d_lbr2xyz(lbr, stream_c); one sees the CUDA version lacks the if statement for the SGR coordinates. Actually the CUDA version assumes that no SGR coordinates are used. At least this is how I read the code, the rotation matrix used in gc_to_gal is the same as in atEqToGal. I will stay with the CPU code version of that for the time being ;) ID: 25954 · Rating: 0 · rate: / Reply Quote