Welcome to MilkyWay@home

CUDA for Milkyway@Home

Message boards : Application Code Discussion : CUDA for Milkyway@Home
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 6 · Next

AuthorMessage
Profile Glenn Rogers
Avatar

Send message
Joined: 4 Jul 08
Posts: 165
Credit: 364,966
RAC: 0
Message 25245 - Posted: 13 Jun 2009, 12:22:36 UTC - in response to Message 25168.  

Thanks for the info not such an easy task it would appear......
ID: 25245 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
SATAN
Avatar

Send message
Joined: 27 Feb 09
Posts: 45
Credit: 305,963
RAC: 0
Message 25251 - Posted: 13 Jun 2009, 13:04:43 UTC

Glenn, it may well be very easy. It's been almost 10 years since I did anything related to Unix commands. So getting CUDA itself to work is probably far easier than I found it. I found an idiots guide.

I take my hate of to Travis and CP for building any GPU app.
Mars rules this confectionery war!
ID: 25251 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cluster Physik

Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 25264 - Posted: 13 Jun 2009, 15:37:04 UTC - in response to Message 25243.  

Well i've finally managed to get CUDA working properly on the Mac Pro. Not bad considering it's a slow old 8800GT.

Do you have any performance figures to share? trisf told us a 9600GT on a C2D 6750 took about 15 minutes for the wedge 20 test unit. These test WUs are quite small so the execution time may be somehow limited by the CPU and all the calling overhead for the GPU stuff. Nevertheless it would be interesting to have a comparison with the 8800GT.
ID: 25264 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
trisf

Send message
Joined: 30 Nov 08
Posts: 11
Credit: 25,658
RAC: 0
Message 25275 - Posted: 13 Jun 2009, 17:34:42 UTC

I tried to run ps_sgr_214F_2s* on my 9600gt and self compiled linux64 binary...

1) insane desktop performance slowdown

2) after running 3hours i have to kill it

3) CPU load 100%
ID: 25275 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Glenn Rogers
Avatar

Send message
Joined: 4 Jul 08
Posts: 165
Credit: 364,966
RAC: 0
Message 25284 - Posted: 13 Jun 2009, 18:42:53 UTC - in response to Message 25251.  

Gday Satan, I dont have any code writing experience or i would have a go at it myself and my ATI X1300 only handles single precision so it looks like i have to upgrade my graphics card...May have go trolling for some info on what my card is actually capable of...

Absolutely hats off to Cluster and Travis they have done an outstanding job getting the app up and running.......

Glenn
ID: 25284 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cluster Physik

Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 25294 - Posted: 13 Jun 2009, 19:25:14 UTC - in response to Message 25275.  

I tried to run ps_sgr_214F_2s* on my 9600gt and self compiled linux64 binary...

1) insane desktop performance slowdown

2) after running 3hours i have to kill it

3) CPU load 100%

Yes, the production WUs are quite bit larger than the test WUs. As the code for MW_GPU does quite a bit more with one WU as the legacy MW@home code (roughly 300 or 400 times as much for the WU you tried to run, would have to check it to give an exact number), it is normal for them to take several hours. The fastest GPUs out there complete these WUs in about 50 seconds with the "classic" algorithm, albeit in double precision. Multiplying that time with 400 equals 5.5 hours. Such long WU were one of the goals of MW_GPU actually.

That slow and sluggish behaviour of the GUI is a side affect of GPU apps with a very high utilization of the GPU. The ATI app also suffered (and still does to some extent) from this. One has to limit the duration of the GPU kernels somehow. That creates short opportunities for other tasks (like the screen refresh) to execute which will result in a smoother experience.

The high CPU load should be easy to cure. One only have to send the application to sleep (a millisecond is enough) when it busy waits for the completion of a GPU kernel. That should be one line in the code (at least I hope so).
ID: 25294 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
SATAN
Avatar

Send message
Joined: 27 Feb 09
Posts: 45
Credit: 305,963
RAC: 0
Message 25325 - Posted: 13 Jun 2009, 22:04:10 UTC

Cluster,

I haven't dared mess with the Milkyway stuff, Gave me a big enough head ache just making sure CUDA was installed correctly.

Will have a go over the next couple of days. I keep screwing something up because it keeps telling me that not target has been set. Will need to go through take a slow careful look at what i'm screwing up.

I doubt i'll notice a slowdown with the desktop though as I run the 8800 purely on its own without a monitor connected. Will post back if/when I finally get the damn thing working properly.
Mars rules this confectionery war!
ID: 25325 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cluster Physik

Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 25329 - Posted: 13 Jun 2009, 22:39:22 UTC - in response to Message 25325.  

Will have a go over the next couple of days. I keep screwing something up because it keeps telling me that not target has been set. Will need to go through take a slow careful look at what i'm screwing up.

I doubt i'll notice a slowdown with the desktop though as I run the 8800 purely on its own without a monitor connected.

That could be the problem. Don't know how it works on a Mac, but under Win and Linux you have to attach a monitor to the card. Otherwise it is not active and one can't run anything on the GPU.
ID: 25329 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
SATAN
Avatar

Send message
Joined: 27 Feb 09
Posts: 45
Credit: 305,963
RAC: 0
Message 25374 - Posted: 14 Jun 2009, 6:23:46 UTC
Last modified: 14 Jun 2009, 6:50:39 UTC

I had no trouble getting it to run under BootCamp without a display connected. I don't know whether it is something in the Apple drivers or not, but I can run the CUDA examlples such as oceanFFT no problems and they show perfectly fine.

Arkayn might have a better idea of why it works.

[img=http://img44.imageshack.us/img44/7240/cudascreenshot.th.png]
Mars rules this confectionery war!
ID: 25374 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Emanuel

Send message
Joined: 18 Nov 07
Posts: 280
Credit: 2,442,757
RAC: 0
Message 25375 - Posted: 14 Jun 2009, 7:08:22 UTC

According to Nvidia the requirement of having to attach a monitor is a strange Microsoft requirement that they could work around - but not without breaking WHQL certification. I don't know the deal with Linux though.
ID: 25375 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile arkayn
Avatar

Send message
Joined: 14 Feb 09
Posts: 999
Credit: 74,932,619
RAC: 0
Message 25376 - Posted: 14 Jun 2009, 7:12:16 UTC - in response to Message 25374.  

I had no trouble getting it to run under BootCamp without a display connected. I don't know whether it is something in the Apple drivers or not, but I can run the CUDA examlples such as oceanFFT no problems and they show perfectly fine.

Arkayn might have a better idea of why it works.

[img=http://img44.imageshack.us/img44/7240/cudascreenshot.th.png]


Not really, I don't know hardly anything about software/driver developing.

I am pretty good on app_info's up to when they added all that fplops to the mix.
ID: 25376 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile verstapp
Avatar

Send message
Joined: 26 Jan 09
Posts: 589
Credit: 497,834,261
RAC: 0
Message 25380 - Posted: 14 Jun 2009, 8:15:50 UTC
Last modified: 14 Jun 2009, 8:18:15 UTC

Or even...






Though you may have to shrink the image to make it fit. Not all of us have wide screens. :)
Cheers,

PeterV

.
ID: 25380 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile borandi
Avatar

Send message
Joined: 21 Feb 09
Posts: 180
Credit: 27,806,824
RAC: 0
Message 25407 - Posted: 14 Jun 2009, 12:05:13 UTC

There is a way around the monitor bug thing in windows without using a second monitor or a dummy plug.

Go to your display settings, enable the second monitor as an extention of your desktop, AND as the primary monitor. When you click apply, you'll be left with a screen which is just your background. Now unplug the monitor cable from it's current graphics card, into the one you just enabled. You should be back to your desktop, albeit able to move your mouse off to the left. This enables both cards.

The one drawback is that sometimes (not often) windows will pop up on the other screen - I had it with my MSN messenger, until I dragged the window over and then it was fine.
ID: 25407 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 25537 - Posted: 15 Jun 2009, 15:38:40 UTC - in response to Message 25329.  

Will have a go over the next couple of days. I keep screwing something up because it keeps telling me that not target has been set. Will need to go through take a slow careful look at what i'm screwing up.

I doubt i'll notice a slowdown with the desktop though as I run the 8800 purely on its own without a monitor connected.

That could be the problem. Don't know how it works on a Mac, but under Win and Linux you have to attach a monitor to the card. Otherwise it is not active and one can't run anything on the GPU.


On the new macbook pros, you need to go into system preferences -> energy saver then select higher performance to use the other (faster) GPU. If you don't want to use that there's a line in evaluation_gpuX.cu which sets the device (it's at 1, i think it should be changed to 0 to use the on-chip GPU).
ID: 25537 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
trisf

Send message
Joined: 30 Nov 08
Posts: 11
Credit: 25,658
RAC: 0
Message 25827 - Posted: 17 Jun 2009, 19:53:44 UTC

Trying to obtain results for linux_x86_64 cuda gpu http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=81905297

after ~4hours got this out result ps_sgr_214F5_2s_hiw_470211_1245248961_0_0

hessian [14 x 14]:
2.28519259071191482846 -0.39497154621000629682 -3.55474399915678374029 0.74480348740320800882 -0.10156406271555340481 -3.85943627057017080162 -0.81251291805806147295 -5.58602580857936370506 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000
-0.39497154621000629682 0.00175542913538606626 0.63195448873898374398 -0.05642450621960675566 -0.01579886098489345983 -0.01579886098489345983 -0.04513960563359534217 0.67709408450393004930 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000
-3.55474399915678374029 0.63195448873898374398 -12.18769307698152992714 0.94793169610104166534 -5.28133370369943122569 5.68759017660624976997 7.10948777626896344373 -48.75077341814914433371 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000
0.74480348740320800882 -0.05642450621960675566 0.94793169610104166534 0.11736297489406412320 -0.31146327739151052905 -0.14896069933101330207 0.15573163869575526452 1.62502578060497171464 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000
-0.10156406271555340481 -0.01579886098489345983 -5.28133370369943122569 -0.31146327739151052905 0.02031282919645605034 0.08125129458136370886 0.12187693076981531703 1.42189759966271367375 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000
-3.85943627057017080162 -0.01579886098489345983 5.68759017660624976997 -0.14896069933101330207 0.08125129458136370886 3.14848748184104465508 -0.28437951993254273475 2.64066690736086684410 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000
-0.81251291805806147295 -0.04513960563359534217 7.10948777626896344373 0.15573163869575526452 0.12187693076981531703 -0.28437951993254273475 -0.10156411267558951295 3.65630814513906399199 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000
-5.58602580857936370506 0.67709408450393004930 -48.75077341814914433371 1.62502578060497171464 1.42189759966271367375 2.64066690736086684410 3.65630814513906399199 16.25025891627273821882 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000
0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000
0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000
0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000
0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000
0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000
0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000
gradient[14]: -0.47922010618095528534, 0.00001805584225343814, -0.12744264776820557472, -0.00211388771820253396, 0.00110095497385387375, -0.16134474836393408737, -0.00103189137901082972, 0.00658135446141017069, 0.00000000000000000000, 0.00000000000000000000, 0.00000000000000000000, 0.00000000000000000000, 0.00000000000000000000, 0.00000000000000000000
initial_fitness: -3.12187372012968600288
inital_parameters[14]: 0.30045917696320600943, 30.00000000000000000000, -0.18211869995751800433, 162.48457188946628093618, 13.45735709916518807461, 6.27418922735829376336, 6.28318530717958623200, 8.36662928454264331890, -19.55595416138490350022, 218.02428246770318764902, 7.94448075244863360922, 5.46189868967828839885, 0.00000000000000000000, 18.87596924383203855768
result_fitness: 0.00000000000000000000
result_parameters[14]: 0.00000000000000000000, 0.00000000000000000000, 0.00000000000000000000, 0.00000000000000000000, 0.00000000000000000000, 0.00000000000000000000, 0.00000000000000000000, 0.00000000000000000000, 0.00000000000000000000, 0.00000000000000000000, 0.00000000000000000000, 0.00000000000000000000, 0.00000000000000000000, 0.00000000000000000000
number_evaluations: 447
metadata: p: 27, v: 0.00005843793239637506 25.03408585491482796215 -0.01973286613357989189 9.72341257218874055468 0.99125759582857653207 0.07406591432380578433 5.64256730759030311617 0.22236400978163128883 0.02643427419835397973 -1.21804191646168202823 -12.80871199567133800201 -0.82128661750129838826 -3.16154559902874643385 2.90070763055914682127


and some stderr.txt

APP: error reading hessian checkpoint file (for read): data_file == NULL
shmget in attach_shmem: Invalid argument
Can't set up shared mem: -1
Will run in standalone mode.
APP: error reading hessian checkpoint file (for read): data_file == NULL
APP: error reading hessian checkpoint file (for write): data_file == NULL
called boinc_finish
shmget in attach_shmem: Invalid argument
Can't set up shared mem: -1
Will run in standalone mode.


wu is still runnning
ID: 25827 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
SATAN
Avatar

Send message
Joined: 27 Feb 09
Posts: 45
Credit: 305,963
RAC: 0
Message 25832 - Posted: 17 Jun 2009, 21:10:51 UTC

I'll give it ago when Travis posts the updated code files. I can't say that i will have any success though.
Mars rules this confectionery war!
ID: 25832 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cluster Physik

Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 25894 - Posted: 18 Jun 2009, 10:11:25 UTC - in response to Message 25827.  

Trying to obtain results for linux_x86_64 cuda gpu http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=81905297

after ~4hours got this out result ps_sgr_214F5_2s_hiw_470211_1245248961_0_0

[..]
number_evaluations: 447
[..]


As this is a 2 stream WU with a double sized wedge, I can give some comparison with a a HD3870 (overclocked to 860MHz). I've not run a whole WU yet (takes too long ;)), but I know the time for a single evaluation. As the the number of evaluations is given in the output file, I can say that HD3870 would take about 8000 seconds (2:15 hours) for the 447 evaluations (roughly 18 seconds per evaluation). This number is deduced from a normal sized wedge, so there is some uncertainty to it (maybe 20%).

What graphics card do you use? Was it a 9600GT? It has 64 stream processors, opposed to the 112 to 128 of the 8800GT/GTX, 9800GT/GTX series. That would mean a G92 based graphics card is roughly as fast as a HD3870 with the current code, depending on the clock and the exact number of enabled units also a bit faster.

The GT200 would battle it out with the HD4800 series then ;)
ID: 25894 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
trisf

Send message
Joined: 30 Nov 08
Posts: 11
Credit: 25,658
RAC: 0
Message 25921 - Posted: 18 Jun 2009, 15:15:26 UTC

Thanks CP.
yes it was 9600gt

strange behavior:
when you stop project wus dont stop and continues to run
only kill boinc helps
ID: 25921 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Glenn Rogers
Avatar

Send message
Joined: 4 Jul 08
Posts: 165
Credit: 364,966
RAC: 0
Message 25942 - Posted: 18 Jun 2009, 18:14:43 UTC - in response to Message 25921.  

In the BOINC manager options menu check the enable manager exit menu check box then ok. Then file exit..the dialog box should have the checkbox stop science applications when exiting manager make sure this is checked click ok. That should be it
ID: 25942 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cluster Physik

Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 25954 - Posted: 18 Jun 2009, 20:23:37 UTC - in response to Message 25827.  
Last modified: 18 Jun 2009, 20:31:36 UTC

Trying to obtain results for linux_x86_64 cuda gpu http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=81905297

after ~4hours got this out result ps_sgr_214F5_2s_hiw_470211_1245248961_0_0

By the way, there may be a bug in the CUDA version when initializing the stream_c parameters. If one compares the init_constants function from the CPU version
		if (ap->sgr_coordinates == 0) {
			atGCToEq(ap->stream_parameters[i][0], 0, &ra, &dec, get_node(), wedge_incl(ap->wedge));
			atEqToGal(ra, dec, &l, &b);
		} else if (ap->sgr_coordinates == 1) {
			gcToSgr(ap->stream_parameters[i][0], 0, ap->wedge, &lamda, &beta); //vickej2
			sgrToGal(lamda, beta, &l, &b); //vickej2
		} else {
			printf("Error: sgr_coordinates not valid");
		}
		lbr[0] = l;
		lbr[1] = b;
		lbr[2] = ap->stream_parameters[i][1];
		lbr2xyz(lbr, stream_c[i]);

with the beginning of gpu__likelihood
		gc_to_gal(wedge, stream_parameters(i,0) * D_DEG2RAD, 0 * D_DEG2RAD, &(lbr[0]), &(lbr[1]));
		lbr[2] = stream_parameters(i,1);
		d_lbr2xyz(lbr, stream_c);

one sees the CUDA version lacks the if statement for the SGR coordinates. Actually the CUDA version assumes that no SGR coordinates are used. At least this is how I read the code, the rotation matrix used in gc_to_gal is the same as in atEqToGal.

I will stay with the CPU code version of that for the time being ;)
ID: 25954 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 . . . 6 · Next

Message boards : Application Code Discussion : CUDA for Milkyway@Home

©2024 Astroinformatics Group