milkyway & milkywayGPU makefile
log in

Advanced search

Message boards : Application Code Discussion : milkyway & milkywayGPU makefile

Author Message
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 23829 - Posted: 1 Jun 2009 | 8:26:58 UTC

Here's a thread for discussing (and improving) the makefile we're using for milkyway. The newest code release has a combined linux, osx and GPU makefile. I've tested it on OSX and it works fine -- unfortunately AFAIK there is no 64 bit or PPC version of CUDA for OSX so it will only compile an i686 binary.

I don't have a linux machine with a GPU to test the makefile, so let me know if the makefile works for those (I'm pretty sure it should as it's shouldn't be doing anything different than OSX).
____________

Profile KSMarksPsych
Avatar
Send message
Joined: 9 Sep 07
Posts: 22
Credit: 320,035
RAC: 0
Message 23831 - Posted: 1 Jun 2009 | 10:13:02 UTC

I have a 64 bit Linux machine with a CUDA capable card. Getting it to work with BOINC hasn't gone well.

If you can give a general idea of what you'd like me to do, I'd be happy to get compiling ;)
____________
Kathryn :o)
The BOINC FAQ Service
The Unofficial BOINC Wiki
The Trac System
More BOINC information than you can shake a stick of RAM at.

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 23832 - Posted: 1 Jun 2009 | 10:39:30 UTC - in response to Message 23831.

I have a 64 bit Linux machine with a CUDA capable card. Getting it to work with BOINC hasn't gone well.

If you can give a general idea of what you'd like me to do, I'd be happy to get compiling ;)


Welp, your guess is as good as mine :) JK.

Well, you'd need to download the cuda driver and toolkit: http://www.nvidia.com/object/cuda_get.html

You can test and see if it works with the samples.

After that, you should be able to just download the latest GPU code from: http://milkyway.cs.rpi.edu/milkyway/download/code_release/

After unzipping, you should be able to go to the /milkyway/bin/ directory and try running the makefile:

make linux_x86_64_gpu

You'll probably need to specify the right directories pointing to where you have boinc and cuda installed in the makefile.
____________

trisf
Send message
Joined: 30 Nov 08
Posts: 11
Credit: 25,658
RAC: 0
Message 23838 - Posted: 1 Jun 2009 | 12:27:29 UTC
Last modified: 1 Jun 2009 | 13:15:01 UTC

Some questions about compilation with make linux_x86_64_gpu
1) what is "evaluator.h" in evaluation/simple_evaluator.c , searches/[hessian,line_search,gradient].c ?
is it evaluation/simple_evaluator.h ?
2) evaluate function in searches/[gradient,hessian,regression,line_search].c no visible declaration.

../searches/hessian.c:127: error: 'evaluate' was not declared in this scope
../searches/hessian.c: In function 'void get_hessian(int, double*, double*, double**)':
../searches/hessian.c:188: error: 'evaluate' was not declared in this scope
../searches/hessian.c:196: error: 'evaluate' was not declared in this scope
make: *** [../searches/hessian.o] Error 1

PS: sorry for english

jedirock
Avatar
Send message
Joined: 8 Nov 08
Posts: 178
Credit: 6,140,854
RAC: 0
Message 23857 - Posted: 1 Jun 2009 | 18:25:06 UTC - in response to Message 23829.

I've tested it on OSX and it works fine -- unfortunately AFAIK there is no 64 bit or PPC version of CUDA for OSX so it will only compile an i686 binary.

AFAIK, there is no CUDA library for OS X, period. Nvidia doesn't release separate drivers, so they have to work with Apple to get them into an OS update. With Apple pushing OpenCL though, they may have to go with OpenCL first. I don't know what Apple's schedule is on that.
____________

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 23888 - Posted: 1 Jun 2009 | 23:09:06 UTC - in response to Message 23857.

I've tested it on OSX and it works fine -- unfortunately AFAIK there is no 64 bit or PPC version of CUDA for OSX so it will only compile an i686 binary.

AFAIK, there is no CUDA library for OS X, period. Nvidia doesn't release separate drivers, so they have to work with Apple to get them into an OS update. With Apple pushing OpenCL though, they may have to go with OpenCL first. I don't know what Apple's schedule is on that.


There's a 32 bit CUDA library for Intel macs. That's what I've been using.
____________

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 23889 - Posted: 1 Jun 2009 | 23:10:04 UTC - in response to Message 23838.
Last modified: 1 Jun 2009 | 23:17:49 UTC

Some questions about compilation with make linux_x86_64_gpu
1) what is "evaluator.h" in evaluation/simple_evaluator.c , searches/[hessian,line_search,gradient].c ?
is it evaluation/simple_evaluator.h ?
2) evaluate function in searches/[gradient,hessian,regression,line_search].c no visible declaration.

../searches/hessian.c:127: error: 'evaluate' was not declared in this scope
../searches/hessian.c: In function 'void get_hessian(int, double*, double*, double**)':
../searches/hessian.c:188: error: 'evaluate' was not declared in this scope
../searches/hessian.c:196: error: 'evaluate' was not declared in this scope
make: *** [../searches/hessian.o] Error 1

PS: sorry for english


Looks like I missed yet another file :( I'll update the v0.05 release.

*update*

Ok it should be in there now.
____________

trisf
Send message
Joined: 30 Nov 08
Posts: 11
Credit: 25,658
RAC: 0
Message 23911 - Posted: 2 Jun 2009 | 5:13:05 UTC

bin/Makefile line 147
missing space

$(OBJ_CXX) $(OBJ_CXXFLAGS) $(LDFLAGS_x86_64) -o milkywayGPU_$(APP_VERSION)_x86_64-pc-linux-gnu$(APP_OBJS) $(SEARCH_OBJS) $(UTIL_OBJS) $(GPU_APP_OBJS) -lboinc -lboinc_api -lcudart
$(OBJ_CXX) $(OBJ_CXXFLAGS) $(LDFLAGS_x86_64) -o milkywayGPU_$(APP_VERSION)_x86_64-pc-linux-gnu $(APP_OBJS) $(SEARCH_OBJS) $(UTIL_OBJS) $(GPU_APP_OBJS) -lboinc -lboinc_api -lcudart

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 23912 - Posted: 2 Jun 2009 | 5:22:28 UTC - in response to Message 23911.

bin/Makefile line 147
missing space

$(OBJ_CXX) $(OBJ_CXXFLAGS) $(LDFLAGS_x86_64) -o milkywayGPU_$(APP_VERSION)_x86_64-pc-linux-gnu$(APP_OBJS) $(SEARCH_OBJS) $(UTIL_OBJS) $(GPU_APP_OBJS) -lboinc -lboinc_api -lcudart
$(OBJ_CXX) $(OBJ_CXXFLAGS) $(LDFLAGS_x86_64) -o milkywayGPU_$(APP_VERSION)_x86_64-pc-linux-gnu $(APP_OBJS) $(SEARCH_OBJS) $(UTIL_OBJS) $(GPU_APP_OBJS) -lboinc -lboinc_api -lcudart


Nice catch, it'll be in the next update.
____________

trisf
Send message
Joined: 30 Nov 08
Posts: 11
Credit: 25,658
RAC: 0
Message 23917 - Posted: 2 Jun 2009 | 7:24:51 UTC
Last modified: 2 Jun 2009 | 7:38:52 UTC

linux_x86_64_gpu
maybe its my problem
linking libboinc_api.a errors without openssl
just added -lssl to line 147

how to run test units with milkywayGPU_0.18_x86_64-pc-linux-gnu?

update
renamed *-20.txt to *txt
executing...
looks like it works

trisf
Send message
Joined: 30 Nov 08
Posts: 11
Credit: 25,658
RAC: 0
Message 23919 - Posted: 2 Jun 2009 | 8:02:27 UTC
Last modified: 2 Jun 2009 | 8:03:22 UTC

linux_x86_64_gpu out, sorry for huge post.


initial likelihood: -2.98530684176687044484

point[14]: 0.57171300000000002672, 12.31211899999999914712, -3.30518700000000009709, 148.01025699999999574175, 22.45390199999999936153, 0.42035000000000000142, -0.46885799999999999699, 0.76057900000000000507, -1.36164400000000007651, 177.88423800000001051558, 23.88289199999999823376, 1.21063900000000002066, -1.61197400000000001796, 8.53437800000000024170
step[14]: 0.00000400000000000000, 0.00008000000000000001, 0.00000100000000000000, 0.00003000000000000000, 0.00004000000000000000, 0.00006000000000000000, 0.00004000000000000000, 0.00000400000000000000, 0.00000100000000000000, 0.00003000000000000000, 0.00004000000000000000, 0.00006000000000000000, 0.00004000000000000000, 0.00000400000000000000

hessian[0][0] = 882.45647594797912915965, (-2.98530682678311976019 - -2.98530684176687044484 - -2.98530684176687044484 + -2.98530680027340666882)/(4 * 0.00000400000000000000 * 0.00000400000000000000)
hessian[0][1] = hessian[1][0] = -6.30326069117614817827, (-2.98530681295196531622 - -2.98530680373119539084 - -2.98530676800071281818 + -2.98530676684811657751)/(4 * 0.00000400000000000000 * 0.00008000000000000001)
hessian[0][2] = hessian[2][0] = -792.40988770656883843913, (-2.98530683369869720423 - -2.98530681525715779756 - -2.98530677952667522490 + -2.98530677376369402154)/(4 * 0.00000400000000000000 * 0.00000100000000000000)
hessian[0][3] = hessian[3][0] = -69.63602102357431533619, (-2.98530682908831224154 - -2.98530681756235027891 - -2.98530676800071281818 + -2.98530678990004094686)/(4 * 0.00000400000000000000 * 0.00003000000000000000)
hessian[0][4] = hessian[4][0] = 54.02794739373106125413, (-2.98530681525715779756 - -2.98530682908831224154 - -2.98530679335782966888 + -2.98530677261109778087)/(4 * 0.00000400000000000000 * 0.00004000000000000000)
hessian[0][5] = hessian[5][0] = 55.22856894035754748984, (-2.98530680488379163151 - -2.98530681525715779756 - -2.98530681525715779756 + -2.98530677261109778087)/(4 * 0.00000400000000000000 * 0.00006000000000000000)
hessian[0][6] = hessian[6][0] = 21.61117881871454571296, (-2.98530682447792727885 - -2.98530684061427420417 - -2.98530678298446350283 + -2.98530678528965598417)/(4 * 0.00000400000000000000 * 0.00004000000000000000)
hessian[0][7] = hessian[7][0] = -198.10247886553611351701, (-2.98530682793571600087 - -2.98530683139350472288 - -2.98530678298446350283 + -2.98530679912081087224)/(4 * 0.00000400000000000000 * 0.00000400000000000000)
hessian[0][8] = hessian[8][0] = -1152.59621291663461306598, (-2.98530683024090848221 - -2.98530682102013855683 - -2.98530677261109778087 + -2.98530678183186726216)/(4 * 0.00000400000000000000 * 0.00000100000000000000)
hessian[0][9] = hessian[9][0] = 36.01863252100656609400, (-2.98530681986754231616 - -2.98530683485129344490 - -2.98530678644225222484 + -2.98530678413705974350)/(4 * 0.00000400000000000000 * 0.00003000000000000000)
hessian[0][10] = hessian[10][0] = 10.80558975630196805184, (-2.98530681525715779756 - -2.98530681871494651958 - -2.98530678759484846552 + -2.98530678413705974350)/(4 * 0.00000400000000000000 * 0.00004000000000000000)
hessian[0][11] = hessian[11][0] = -22.81180013404456374815, (-2.98530680949417659420 - -2.98530678413705974350 - -2.98530681525715779756 + -2.98530681179936907554)/(4 * 0.00000400000000000000 * 0.00006000000000000000)
hessian[0][12] = hessian[12][0] = 90.04657991473762024270, (-2.98530678874744470619 - -2.98530683830908216692 - -2.98530680834158035353 + -2.98530680027340666882)/(4 * 0.00000400000000000000 * 0.00004000000000000000)
hessian[0][13] = hessian[13][0] = 108.05589062412579437478, (-2.98530683254610096355 - -2.98530684176687044484 - -2.98530678874744470619 + -2.98530679105263718753)/(4 * 0.00000400000000000000 * 0.00000400000000000000)
hessian[1][1] = 2.70139735233931821412, (-2.98530683485129344490 - -2.98530684176687044484 - -2.98530684176687044484 + -2.98530677952667522490)/(4 * 0.00008000000000000001 * 0.00008000000000000001)
hessian[1][2] = hessian[2][1] = 3.60186325210065616531, (-2.98530682678311976019 - -2.98530682102013855683 - -2.98530681410456155689 + -2.98530680718898411286)/(4 * 0.00008000000000000001 * 0.00000100000000000000)
hessian[1][3] = hessian[3][1] = 3.36173894277536033925, (-2.98530682908831224154 - -2.98530684522465916686 - -2.98530680027340666882 + -2.98530678413705974350)/(4 * 0.00008000000000000001 * 0.00003000000000000000)
hessian[1][4] = hessian[4][1] = -1.44074526614579290218, (-2.98530684868244788888 - -2.98530682102013855683 - -2.98530679681561839089 + -2.98530678759484846552)/(4 * 0.00008000000000000001 * 0.00004000000000000000)
hessian[1][5] = hessian[5][1] = -0.42021737941174319708, (-2.98530682563052351952 - -2.98530680718898411286 - -2.98530678874744470619 + -2.98530677837407898423)/(4 * 0.00008000000000000001 * 0.00006000000000000000)
hessian[1][6] = hessian[6][1] = 3.42176998541221477623, (-2.98530681295196531622 - -2.98530684983504412955 - -2.98530680142600290949 + -2.98530679451042590955)/(4 * 0.00008000000000000001 * 0.00004000000000000000)
hessian[1][7] = hessian[7][1] = 0.90046581302516404133, (-2.98530683485129344490 - -2.98530683485129344490 - -2.98530679105263718753 + -2.98530678990004094686)/(4 * 0.00008000000000000001 * 0.00000400000000000000)
hessian[1][8] = hessian[8][1] = 64.83353715003303818776, (-2.98530682102013855683 - -2.98530682908831224154 - -2.98530681986754231616 + -2.98530680718898411286)/(4 * 0.00008000000000000001 * 0.00000100000000000000)
hessian[1][9] = hessian[9][1] = 0.24012417054741772016, (-2.98530683369869720423 - -2.98530683024090848221 - -2.98530680142600290949 + -2.98530679566302215022)/(4 * 0.00008000000000000001 * 0.00003000000000000000)
hessian[1][10] = hessian[10][1] = 0.09004658130251640136, (-2.98530683369869720423 - -2.98530683485129344490 - -2.98530679335782966888 + -2.98530679335782966888)/(4 * 0.00008000000000000001 * 0.00004000000000000000)
hessian[1][11] = hessian[11][1] = 0.42021735628209683222, (-2.98530683139350472288 - -2.98530681179936907554 - -2.98530682217273479750 + -2.98530679451042590955)/(4 * 0.00008000000000000001 * 0.00006000000000000000)
hessian[1][12] = hessian[12][1] = 2.07107130056893806724, (-2.98530677261109778087 - -2.98530682563052351952 - -2.98530676915330905885 + -2.98530679566302215022)/(4 * 0.00008000000000000001 * 0.00004000000000000000)
hessian[1][13] = hessian[13][1] = -3.60186325210065616531, (-2.98530683254610096355 - -2.98530683139350472288 - -2.98530678528965598417 + -2.98530678874744470619)/(4 * 0.00008000000000000001 * 0.00000400000000000000)
hessian[2][2] = -1152.59635169451257752371, (-2.98530684868244788888 - -2.98530684176687044484 - -2.98530684176687044484 + -2.98530683946167840759)/(4 * 0.00000100000000000000 * 0.00000100000000000000)
hessian[2][3] = hessian[3][2] = 96.04968302194076557043, (-2.98530682217273479750 - -2.98530684637725540753 - -2.98530682217273479750 + -2.98530683485129344490)/(4 * 0.00000100000000000000 * 0.00003000000000000000)
hessian[2][4] = hessian[4][2] = -36.01863252100655898857, (-2.98530683715648592624 - -2.98530682102013855683 - -2.98530683254610096355 + -2.98530682217273479750)/(4 * 0.00000100000000000000 * 0.00004000000000000000)
hessian[2][5] = hessian[5][2] = 28.81490786717695939956, (-2.98530682447792727885 - -2.98530681756235027891 - -2.98530681756235027891 + -2.98530680373119539084)/(4 * 0.00000100000000000000 * 0.00006000000000000000)
hessian[2][6] = hessian[6][2] = 0.00000000000000000000, (-2.98530682678311976019 - -2.98530684061427420417 - -2.98530681410456155689 + -2.98530682793571600087)/(4 * 0.00000100000000000000 * 0.00004000000000000000)
hessian[2][7] = hessian[7][2] = 288.14903241247691312310, (-2.98530683715648592624 - -2.98530683139350472288 - -2.98530684291946668552 + -2.98530683254610096355)/(4 * 0.00000100000000000000 * 0.00000400000000000000)
hessian[2][8] = hessian[8][2] = 3169.63955082627535375650, (-2.98530683254610096355 - -2.98530684868244788888 - -2.98530683369869720423 + -2.98530683715648592624)/(4 * 0.00000100000000000000 * 0.00000100000000000000)
hessian[2][9] = hessian[9][2] = 19.20993364379341983295, (-2.98530683946167840759 - -2.98530683139350472288 - -2.98530682908831224154 + -2.98530681871494651958)/(4 * 0.00000100000000000000 * 0.00003000000000000000)
hessian[2][10] = hessian[10][2] = -93.64844177905949607066, (-2.98530684868244788888 - -2.98530680488379163151 - -2.98530683946167840759 + -2.98530681064677283487)/(4 * 0.00000100000000000000 * 0.00004000000000000000)
hessian[2][11] = hessian[11][2] = 0.00000185037170770859, (-2.98530682102013855683 - -2.98530679451042590955 - -2.98530681525715779756 + -2.98530678874744470619)/(4 * 0.00000100000000000000 * 0.00006000000000000000)
hessian[2][12] = hessian[12][2] = 7.20372650420131233062, (-2.98530676454292409616 - -2.98530681295196531622 - -2.98530676339032785549 + -2.98530681064677283487)/(4 * 0.00000100000000000000 * 0.00004000000000000000)
hessian[2][13] = hessian[13][2] = -432.22359025207879312802, (-2.98530683715648592624 - -2.98530682678311976019 - -2.98530682678311976019 + -2.98530682332533103818)/(4 * 0.00000100000000000000 * 0.00000400000000000000)
hessian[3][3] = 3.52182172314030594862, (-2.98530682793571600087 - -2.98530684176687044484 - -2.98530684176687044484 + -2.98530684291946668552)/(4 * 0.00003000000000000000 * 0.00003000000000000000)
hessian[3][4] = hessian[4][3] = -4.56235993429032671287, (-2.98530684176687044484 - -2.98530681871494651958 - -2.98530684522465916686 + -2.98530684407206292619)/(4 * 0.00003000000000000000 * 0.00004000000000000000)
hessian[3][5] = hessian[5][3] = 1.28066248963578899200, (-2.98530682332533103818 - -2.98530681986754231616 - -2.98530683369869720423 + -2.98530682102013855683)/(4 * 0.00003000000000000000 * 0.00006000000000000000)
hessian[3][6] = hessian[6][3] = -1.20062099151496659566, (-2.98530684637725540753 - -2.98530685098764037022 - -2.98530683254610096355 + -2.98530684291946668552)/(4 * 0.00003000000000000000 * 0.00004000000000000000)
hessian[3][7] = hessian[7][3] = -7.20372650420131321880, (-2.98530683254610096355 - -2.98530682793571600087 - -2.98530684637725540753 + -2.98530684522465916686)/(4 * 0.00003000000000000000 * 0.00000400000000000000)
hessian[3][8] = hessian[8][3] = -67.23477700513551269523, (-2.98530684407206292619 - -2.98530682332533103818 - -2.98530685444542909224 + -2.98530684176687044484)/(4 * 0.00003000000000000000 * 0.00000100000000000000)
hessian[3][9] = hessian[9][3] = 1.92099373445368359903, (-2.98530683369869720423 - -2.98530683024090848221 - -2.98530685214023661089 + -2.98530684176687044484)/(4 * 0.00003000000000000000 * 0.00003000000000000000)
hessian[3][10] = hessian[10][3] = -1.44074520832167696227, (-2.98530685098764037022 - -2.98530681756235027891 - -2.98530685329283285157 + -2.98530682678311976019)/(4 * 0.00003000000000000000 * 0.00004000000000000000)
hessian[3][11] = hessian[11][3] = -0.32016562240894724800, (-2.98530682908831224154 - -2.98530681525715779756 - -2.98530682102013855683 + -2.98530680949417659420)/(4 * 0.00003000000000000000 * 0.00006000000000000000)
hessian[3][12] = hessian[12][3] = -0.48024834109483544031, (-2.98530678298446350283 - -2.98530683139350472288 - -2.98530677952667522490 + -2.98530683024090848221)/(4 * 0.00003000000000000000 * 0.00004000000000000000)
hessian[3][13] = hessian[13][3] = 0.00000000000000000000, (-2.98530683254610096355 - -2.98530683024090848221 - -2.98530684868244788888 + -2.98530684637725540753)/(4 * 0.00003000000000000000 * 0.00000400000000000000)
hessian[4][4] = 1.08055890624125772170, (-2.98530683600388968557 - -2.98530684176687044484 - -2.98530684176687044484 + -2.98530684061427420417)/(4 * 0.00004000000000000000 * 0.00004000000000000000)
hessian[4][5] = hessian[5][4] = -2.64136629235522901737, (-2.98530684061427420417 - -2.98530681986754231616 - -2.98530681756235027891 + -2.98530682217273479750)/(4 * 0.00004000000000000000 * 0.00006000000000000000)
hessian[4][6] = hessian[6][4] = 1.62083853283423429126, (-2.98530684176687044484 - -2.98530685905581405493 - -2.98530683715648592624 + -2.98530684407206292619)/(4 * 0.00004000000000000000 * 0.00004000000000000000)
hessian[4][7] = hessian[7][4] = -1.80093162605032808266, (-2.98530685444542909224 - -2.98530685214023661089 - -2.98530683715648592624 + -2.98530683600388968557)/(4 * 0.00004000000000000000 * 0.00000400000000000000)
hessian[4][8] = hessian[8][4] = 7.20372927975887389351, (-2.98530684291946668552 - -2.98530683715648592624 - -2.98530683600388968557 + -2.98530682908831224154)/(4 * 0.00004000000000000000 * 0.00000100000000000000)
hessian[4][9] = hessian[9][4] = 5.28273267722904371624, (-2.98530684983504412955 - -2.98530684637725540753 - -2.98530683830908216692 + -2.98530680949417659420)/(4 * 0.00004000000000000000 * 0.00003000000000000000)
hessian[4][10] = hessian[10][4] = 6.30326055239827010013, (-2.98530684061427420417 - -2.98530683139350472288 - -2.98530684407206292619 + -2.98530679451042590955)/(4 * 0.00004000000000000000 * 0.00004000000000000000)
hessian[4][11] = hessian[11][4] = -2.88149055542123200269, (-2.98530682908831224154 - -2.98530680142600290949 - -2.98530681756235027891 + -2.98530681756235027891)/(4 * 0.00004000000000000000 * 0.00006000000000000000)
hessian[4][12] = hessian[12][4] = 0.18009316260503280271, (-2.98530678644225222484 - -2.98530683254610096355 - -2.98530678528965598417 + -2.98530683024090848221)/(4 * 0.00004000000000000000 * 0.00004000000000000000)
hessian[4][13] = hessian[13][4] = 5.40279487815098402592, (-2.98530683830908216692 - -2.98530683715648592624 - -2.98530683024090848221 + -2.98530682563052351952)/(4 * 0.00004000000000000000 * 0.00000400000000000000)
hessian[5][5] = 3.92202875115148996699, (-2.98530683254610096355 - -2.98530684176687044484 - -2.98530684176687044484 + -2.98530679451042590955)/(4 * 0.00006000000000000000 * 0.00006000000000000000)
hessian[5][6] = hessian[6][5] = -7.32378842756749648402, (-2.98530684291946668552 - -2.98530684176687044484 - -2.98530677145850154020 + -2.98530684061427420417)/(4 * 0.00006000000000000000 * 0.00004000000000000000)
hessian[5][7] = hessian[7][5] = 7.20372696679423984989, (-2.98530682793571600087 - -2.98530683946167840759 - -2.98530681756235027891 + -2.98530682217273479750)/(4 * 0.00006000000000000000 * 0.00000400000000000000)
hessian[5][8] = hessian[8][5] = 33.61738850256775634762, (-2.98530683715648592624 - -2.98530682447792727885 - -2.98530683600388968557 + -2.98530681525715779756)/(4 * 0.00006000000000000000 * 0.00000100000000000000)
hessian[5][9] = hessian[9][5] = 4.16215302963725708452, (-2.98530682793571600087 - -2.98530683830908216692 - -2.98530684061427420417 + -2.98530682102013855683)/(4 * 0.00006000000000000000 * 0.00003000000000000000)
hessian[5][10] = hessian[10][5] = 1.08055892937090414208, (-2.98530683139350472288 - -2.98530682217273479750 - -2.98530683254610096355 + -2.98530681295196531622)/(4 * 0.00006000000000000000 * 0.00004000000000000000)
hessian[5][11] = hessian[11][5] = 0.80041402518283966128, (-2.98530681640975403823 - -2.98530680142600290949 - -2.98530680718898411286 + -2.98530678067927102148)/(4 * 0.00006000000000000000 * 0.00006000000000000000)
hessian[5][12] = hessian[12][5] = -0.72037265042013121086, (-2.98530679220523342821 - -2.98530680949417659420 - -2.98530678644225222484 + -2.98530681064677283487)/(4 * 0.00006000000000000000 * 0.00004000000000000000)
hessian[5][13] = hessian[13][5] = 1.20062108403355227715, (-2.98530683830908216692 - -2.98530683369869720423 - -2.98530682563052351952 + -2.98530681986754231616)/(4 * 0.00006000000000000000 * 0.00000400000000000000)
hessian[6][6] = 1.08055883685231868263, (-2.98530683830908216692 - -2.98530684176687044484 - -2.98530684176687044484 + -2.98530683830908216692)/(4 * 0.00004000000000000000 * 0.00004000000000000000)
hessian[6][7] = hessian[7][6] = -12.60652068846290596582, (-2.98530684176687044484 - -2.98530683600388968557 - -2.98530685214023661089 + -2.98530685444542909224)/(4 * 0.00004000000000000000 * 0.00000400000000000000)
hessian[6][8] = hessian[8][6] = -108.05589756301968407115, (-2.98530685329283285157 - -2.98530682102013855683 - -2.98530684752985164820 + -2.98530683254610096355)/(4 * 0.00004000000000000000 * 0.00000100000000000000)
hessian[6][9] = hessian[9][6] = -2.88149050916193960603, (-2.98530685329283285157 - -2.98530683369869720423 - -2.98530685098764037022 + -2.98530684522465916686)/(4 * 0.00004000000000000000 * 0.00003000000000000000)
hessian[6][10] = hessian[10][6] = -3.24167685750165146530, (-2.98530683600388968557 - -2.98530681525715779756 - -2.98530684291946668552 + -2.98530684291946668552)/(4 * 0.00004000000000000000 * 0.00004000000000000000)
hessian[6][11] = hessian[11][6] = 4.08211154693619882039, (-2.98530683139350472288 - -2.98530682102013855683 - -2.98530686481879481420 + -2.98530681525715779756)/(4 * 0.00004000000000000000 * 0.00006000000000000000)
hessian[6][12] = hessian[12][6] = 1.08055897563019676078, (-2.98530680027340666882 - -2.98530684522465916686 - -2.98530680373119539084 + -2.98530684176687044484)/(4 * 0.00004000000000000000 * 0.00004000000000000000)
hessian[6][13] = hessian[13][6] = 14.40745231451323427052, (-2.98530683600388968557 - -2.98530684637725540753 - -2.98530683830908216692 + -2.98530683946167840759)/(4 * 0.00004000000000000000 * 0.00000400000000000000)
hessian[7][7] = 36.01862558211266218677, (-2.98530684176687044484 - -2.98530684176687044484 - -2.98530684176687044484 + -2.98530683946167840759)/(4 * 0.00000400000000000000 * 0.00000400000000000000)
hessian[7][8] = hessian[8][7] = -648.33538537811818969203, (-2.98530684868244788888 - -2.98530683369869720423 - -2.98530684291946668552 + -2.98530683830908216692)/(4 * 0.00000400000000000000 * 0.00000100000000000000)
hessian[7][9] = hessian[9][7] = -2.40124216806710455430, (-2.98530684752985164820 - -2.98530683715648592624 - -2.98530684522465916686 + -2.98530683600388968557)/(4 * 0.00000400000000000000 * 0.00003000000000000000)
hessian[7][10] = hessian[10][7] = -45.02328926347941973063, (-2.98530684868244788888 - -2.98530681179936907554 - -2.98530683600388968557 + -2.98530682793571600087)/(4 * 0.00000400000000000000 * 0.00004000000000000000)
hessian[7][11] = hessian[11][7] = -25.21304230211167052289, (-2.98530683946167840759 - -2.98530680488379163151 - -2.98530681640975403823 + -2.98530680603638787218)/(4 * 0.00000400000000000000 * 0.00006000000000000000)
hessian[7][12] = hessian[12][7] = -19.81024719266421740826, (-2.98530679220523342821 - -2.98530683139350472288 - -2.98530677722148274356 + -2.98530682908831224154)/(4 * 0.00000400000000000000 * 0.00004000000000000000)
hessian[7][13] = hessian[13][7] = -36.01863252100656609400, (-2.98530684637725540753 - -2.98530683946167840759 - -2.98530684061427420417 + -2.98530683600388968557)/(4 * 0.00000400000000000000 * 0.00000400000000000000)
hessian[8][8] = 6051.13004148449817876099, (-2.98530682332533103818 - -2.98530684176687044484 - -2.98530684176687044484 + -2.98530683600388968557)/(4 * 0.00000100000000000000 * 0.00000100000000000000)
hessian[8][9] = hessian[9][8] = -76.83974937814734573749, (-2.98530685559802533291 - -2.98530684061427420417 - -2.98530683485129344490 + -2.98530682908831224154)/(4 * 0.00000100000000000000 * 0.00003000000000000000)
hessian[8][10] = hessian[10][8] = 201.70433656652161857892, (-2.98530685559802533291 - -2.98530684868244788888 - -2.98530684868244788888 + -2.98530680949417659420)/(4 * 0.00000100000000000000 * 0.00004000000000000000)
hessian[8][11] = hessian[11][8] = -24.01241983029933635407, (-2.98530682678311976019 - -2.98530681064677283487 - -2.98530681295196531622 + -2.98530680257859915017)/(4 * 0.00000100000000000000 * 0.00006000000000000000)
hessian[8][12] = hessian[12][8] = 151.27825103711242604732, (-2.98530678874744470619 - -2.98530684061427420417 - -2.98530676800071281818 + -2.98530679566302215022)/(4 * 0.00000100000000000000 * 0.00004000000000000000)
hessian[8][13] = hessian[13][8] = -216.11179512603939656401, (-2.98530684752985164820 - -2.98530683369869720423 - -2.98530684061427420417 + -2.98530683024090848221)/(4 * 0.00000100000000000000 * 0.00000400000000000000)
hessian[9][9] = 1.28066236627767526812, (-2.98530685098764037022 - -2.98530684176687044484 - -2.98530684176687044484 + -2.98530682793571600087)/(4 * 0.00003000000000000000 * 0.00003000000000000000)
hessian[9][10] = hessian[10][9] = -1.68086951764697278833, (-2.98530685675062157358 - -2.98530683024090848221 - -2.98530684061427420417 + -2.98530682217273479750)/(4 * 0.00003000000000000000 * 0.00004000000000000000)
hessian[9][11] = hessian[11][9] = 2.08107648397910027782, (-2.98530682908831224154 - -2.98530680373119539084 - -2.98530682908831224154 + -2.98530678874744470619)/(4 * 0.00003000000000000000 * 0.00006000000000000000)
hessian[9][12] = hessian[12][9] = -6.96360219487601650457, (-2.98530678874744470619 - -2.98530682447792727885 - -2.98530676800071281818 + -2.98530683715648592624)/(4 * 0.00003000000000000000 * 0.00004000000000000000)
hessian[9][13] = hessian[13][9] = 14.40745300840262643760, (-2.98530684752985164820 - -2.98530683369869720423 - -2.98530684868244788888 + -2.98530682793571600087)/(4 * 0.00003000000000000000 * 0.00000400000000000000)
hessian[10][10] = 5.76298106458317160872, (-2.98530684061427420417 - -2.98530684176687044484 - -2.98530684176687044484 + -2.98530680603638787218)/(4 * 0.00004000000000000000 * 0.00004000000000000000)
hessian[10][11] = hessian[11][10] = 0.24012421680671039437, (-2.98530680142600290949 - -2.98530678990004094686 - -2.98530683024090848221 + -2.98530681640975403823)/(4 * 0.00004000000000000000 * 0.00006000000000000000)
hessian[10][12] = hessian[12][10] = 2.88149053229158580436, (-2.98530679335782966888 - -2.98530684637725540753 - -2.98530677837407898423 + -2.98530681295196531622)/(4 * 0.00004000000000000000 * 0.00004000000000000000)
hessian[10][13] = hessian[13][10] = -10.80558975630196805184, (-2.98530685790321781425 - -2.98530684983504412955 - -2.98530681871494651958 + -2.98530681756235027891)/(4 * 0.00004000000000000000 * 0.00000400000000000000)
hessian[11][11] = 3.28169753717312406849, (-2.98530682678311976019 - -2.98530684176687044484 - -2.98530684176687044484 + -2.98530680949417659420)/(4 * 0.00006000000000000000 * 0.00006000000000000000)
hessian[11][12] = hessian[12][11] = 2.28117996714516335643, (-2.98530679335782966888 - -2.98530682217273479750 - -2.98530677145850154020 + -2.98530677837407898423)/(4 * 0.00006000000000000000 * 0.00004000000000000000)
hessian[11][13] = hessian[13][11] = -2.40124216806710455430, (-2.98530682447792727885 - -2.98530682563052351952 - -2.98530680718898411286 + -2.98530681064677283487)/(4 * 0.00006000000000000000 * 0.00000400000000000000)
hessian[12][12] = 20.35052688864613301689, (-2.98530677145850154020 - -2.98530684176687044484 - -2.98530684176687044484 + -2.98530678183186726216)/(4 * 0.00004000000000000000 * 0.00004000000000000000)
hessian[12][13] = hessian[13][12] = -16.20838463445295474230, (-2.98530677145850154020 - -2.98530677030590529952 - -2.98530682102013855683 + -2.98530683024090848221)/(4 * 0.00004000000000000000 * 0.00000400000000000000)
hessian[13][13] = 198.10247192664220960978, (-2.98530684407206292619 - -2.98530684176687044484 - -2.98530684176687044484 + -2.98530682678311976019)/(4 * 0.00000400000000000000 * 0.00000400000000000000)
gradient[0]: -0.00633927915716370194, (-2.98530684061427420417 - -2.98530678990004094686)/(2 * 0.000004)
gradient[1]: -0.00025933414860013215, (-2.98530683254610096355 - -2.98530679105263718753)/(2 * 0.000080)
gradient[2]: -0.00576298120336105058, (-2.98530683946167840759 - -2.98530682793571600087)/(2 * 0.000001)
gradient[3]: 0.00028814905276656572, (-2.98530683024090848221 - -2.98530684752985164820)/(2 * 0.000030)
gradient[4]: -0.00011525961851610587, (-2.98530684176687044484 - -2.98530683254610096355)/(2 * 0.000040)
gradient[5]: -0.00010565465539495258, (-2.98530683369869720423 - -2.98530682102013855683)/(2 * 0.000060)
gradient[6]: 0.00011525961851610587, (-2.98530683715648592624 - -2.98530684637725540753)/(2 * 0.000040)
gradient[7]: 0.00086444718050415759, (-2.98530684061427420417 - -2.98530684752985164820)/(2 * 0.000004)
gradient[8]: -0.00518668286098034059, (-2.98530684291946668552 - -2.98530683254610096355)/(2 * 0.000001)
gradient[9]: -0.00013446955401027102, (-2.98530684407206292619 - -2.98530683600388968557)/(2 * 0.000030)
gradient[10]: -0.00037459377266735311, (-2.98530684983504412955 - -2.98530681986754231616)/(2 * 0.000040)
gradient[11]: -0.00012486458903874600, (-2.98530682332533103818 - -2.98530680834158035353)/(2 * 0.000060)
gradient[12]: 0.00048985339118345905, (-2.98530678528965598417 - -2.98530682447792727885)/(2 * 0.000040)
gradient[13]: -0.00057629806482495383, (-2.98530684291946668552 - -2.98530683830908216692)/(2 * 0.000004)
line search starting at fitness: -2.985306841766870
initial point: [14]: 0.57171300000000002672, 12.31211899999999914712, -3.30518700000000009709, 148.01025699999999574175, 22.45390199999999936153, 0.42035000000000000142, -0.46885799999999999699, 0.76057900000000000507, -1.36164400000000007651, 177.88423800000001051558, 23.88289199999999823376, 1.21063900000000002066, -1.61197400000000001796, 8.53437800000000024170
loop 1, evaluations: 1, step: 1.000000000000000, fitness: -2.985306747253981
loop 2, evaluations: 2, step: 2.000000000000000, fitness: -2.985306733422826
loop 2, evaluations: 3, step: 4.000000000000000, fitness: -2.985306850987640
loop 3, evaluations: 4, step: 1.785714283530075, fitness: -2.985306755322155
loop 3, evaluations: 5, step: 2.595721102268431, fitness: -2.985306818714947
loop 3, evaluations: 6, step: 2.061540494505907, fitness: -2.985306771458502
loop 3, evaluations: 7, step: 1.912425577977664, fitness: -2.985306748406577
loop 3, evaluations: 8, step: 1.972377618796003, fitness: -2.985306729965038
loop 3, evaluations: 9, step: 1.973523614802404, fitness: -2.985306733422826

trisf
Send message
Joined: 30 Nov 08
Posts: 11
Credit: 25,658
RAC: 0
Message 23920 - Posted: 2 Jun 2009 | 8:04:38 UTC

out file

hessian [14 x 14]:
882.45647594797912915965 -6.30326069117614817827 -792.40988770656883843913 -69.63602102357431533619 54.02794739373106125413 55.22856894035754748984 21.61117881871454571296 -198.10247886553611351701 -1152.59621291663461306598 36.01863252100656609400 10.80558975630196805184 -22.81180013404456374815 90.04657991473762024270 108.05589062412579437478
-6.30326069117614817827 2.70139735233931821412 3.60186325210065616531 3.36173894277536033925 -1.44074526614579290218 -0.42021737941174319708 3.42176998541221477623 0.90046581302516404133 64.83353715003303818776 0.24012417054741772016 0.09004658130251640136 0.42021735628209683222 2.07107130056893806724 -3.60186325210065616531
-792.40988770656883843913 3.60186325210065616531 -1152.59635169451257752371 96.04968302194076557043 -36.01863252100655898857 28.81490786717695939956 0.00000000000000000000 288.14903241247691312310 3169.63955082627535375650 19.20993364379341983295 -93.64844177905949607066 0.00000185037170770859 7.20372650420131233062 -432.22359025207879312802
-69.63602102357431533619 3.36173894277536033925 96.04968302194076557043 3.52182172314030594862 -4.56235993429032671287 1.28066248963578899200 -1.20062099151496659566 -7.20372650420131321880 -67.23477700513551269523 1.92099373445368359903 -1.44074520832167696227 -0.32016562240894724800 -0.48024834109483544031 0.00000000000000000000
54.02794739373106125413 -1.44074526614579290218 -36.01863252100655898857 -4.56235993429032671287 1.08055890624125772170 -2.64136629235522901737 1.62083853283423429126 -1.80093162605032808266 7.20372927975887389351 5.28273267722904371624 6.30326055239827010013 -2.88149055542123200269 0.18009316260503280271 5.40279487815098402592
55.22856894035754748984 -0.42021737941174319708 28.81490786717695939956 1.28066248963578899200 -2.64136629235522901737 3.92202875115148996699 -7.32378842756749648402 7.20372696679423984989 33.61738850256775634762 4.16215302963725708452 1.08055892937090414208 0.80041402518283966128 -0.72037265042013121086 1.20062108403355227715
21.61117881871454571296 3.42176998541221477623 0.00000000000000000000 -1.20062099151496659566 1.62083853283423429126 -7.32378842756749648402 1.08055883685231868263 -12.60652068846290596582 -108.05589756301968407115 -2.88149050916193960603 -3.24167685750165146530 4.08211154693619882039 1.08055897563019676078 14.40745231451323427052
-198.10247886553611351701 0.90046581302516404133 288.14903241247691312310 -7.20372650420131321880 -1.80093162605032808266 7.20372696679423984989 -12.60652068846290596582 36.01862558211266218677 -648.33538537811818969203 -2.40124216806710455430 -45.02328926347941973063 -25.21304230211167052289 -19.81024719266421740826 -36.01863252100656609400
-1152.59621291663461306598 64.83353715003303818776 3169.63955082627535375650 -67.23477700513551269523 7.20372927975887389351 33.61738850256775634762 -108.05589756301968407115 -648.33538537811818969203 6051.13004148449817876099 -76.83974937814734573749 201.70433656652161857892 -24.01241983029933635407 151.27825103711242604732 -216.11179512603939656401
36.01863252100656609400 0.24012417054741772016 19.20993364379341983295 1.92099373445368359903 5.28273267722904371624 4.16215302963725708452 -2.88149050916193960603 -2.40124216806710455430 -76.83974937814734573749 1.28066236627767526812 -1.68086951764697278833 2.08107648397910027782 -6.96360219487601650457 14.40745300840262643760
10.80558975630196805184 0.09004658130251640136 -93.64844177905949607066 -1.44074520832167696227 6.30326055239827010013 1.08055892937090414208 -3.24167685750165146530 -45.02328926347941973063 201.70433656652161857892 -1.68086951764697278833 5.76298106458317160872 0.24012421680671039437 2.88149053229158580436 -10.80558975630196805184
-22.81180013404456374815 0.42021735628209683222 0.00000185037170770859 -0.32016562240894724800 -2.88149055542123200269 0.80041402518283966128 4.08211154693619882039 -25.21304230211167052289 -24.01241983029933635407 2.08107648397910027782 0.24012421680671039437 3.28169753717312406849 2.28117996714516335643 -2.40124216806710455430
90.04657991473762024270 2.07107130056893806724 7.20372650420131233062 -0.48024834109483544031 0.18009316260503280271 -0.72037265042013121086 1.08055897563019676078 -19.81024719266421740826 151.27825103711242604732 -6.96360219487601650457 2.88149053229158580436 2.28117996714516335643 20.35052688864613301689 -16.20838463445295474230
108.05589062412579437478 -3.60186325210065616531 -432.22359025207879312802 0.00000000000000000000 5.40279487815098402592 1.20062108403355227715 14.40745231451323427052 -36.01863252100656609400 -216.11179512603939656401 14.40745300840262643760 -10.80558975630196805184 -2.40124216806710455430 -16.20838463445295474230 198.10247192664220960978
gradient[14]: -0.00633927915716370194, -0.00025933414860013215, -0.00576298120336105058, 0.00028814905276656572, -0.00011525961851610587, -0.00010565465539495258, 0.00011525961851610587, 0.00086444718050415759, -0.00518668286098034059, -0.00013446955401027102, -0.00037459377266735311, -0.00012486458903874600, 0.00048985339118345905, -0.00057629806482495383
initial_fitness: -2.98530684176687044484
inital_parameters[14]: 0.57171300000000002672, 12.31211899999999914712, -3.30518700000000009709, 148.01025699999999574175, 22.45390199999999936153, 0.42035000000000000142, -0.46885799999999999699, 0.76057900000000000507, -1.36164400000000007651, 177.88423800000001051558, 23.88289199999999823376, 1.21063900000000002066, -1.61197400000000001796, 8.53437800000000024170
result_fitness: -2.98530673342282648619
result_parameters[14]: 0.57171023310301738451, 12.31159045077501801302, -3.30517495098843516743, 148.01059727942629251629, 22.45401028483425420745, 0.42010611491115568139, -0.46892261685864594645, 0.76055531511989238336, -1.36164776621840832860, 177.88429089670174221283, 23.88286053550067578044, 1.21053790484657564086, -1.61181317710548888122, 8.53439162571984688554
number_evaluations: 444
metadata: it: 5, ev: 588

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 23935 - Posted: 2 Jun 2009 | 9:44:28 UTC - in response to Message 23920.

Sweet. Looks like you're getting what I'm getting. Is that for stripe 20?
____________

trisf
Send message
Joined: 30 Nov 08
Posts: 11
Credit: 25,658
RAC: 0
Message 23953 - Posted: 2 Jun 2009 | 11:40:52 UTC

yes for 20

Cluster Physik
Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 23998 - Posted: 2 Jun 2009 | 21:12:16 UTC - in response to Message 23935.

initial likelihood: -2.98530684176687044484

Sweet. Looks like you're getting what I'm getting. Is that for stripe 20?

If that is okay, I guess my single precision ATI implementation will do it, too.

I'm getting a fitness of -2.985312812926748 for the stripe20 test unit.
As a reference point, the stock CPU app (using a complete DP calculation) arrives at a value of -2.985312797571472.

So it appears my approach ist actually a bit (two digits, i.e. 6 bits) more precise :D

In the moment I'm using the integration layout as mentioned in the other thread (mu-r plane) and doing all summations with the Kahan method. This includes the convolution loop and the summation of all the values between different mu-r planes as well as the final reduction (done on GPU in SP as a treelike Kahan sum). That way I have to transfer virtually nothing (16 bytes or so for the whole integral) back from the GPU to the CPU. As it appears to me, it is unnecessary to do the reduction on the CPU. But I have to mention, that I do all CPU operations (including the likelihood compution) in DP. I have to test, if one looses the precision there (or have you tried it already, Travis?).

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 24013 - Posted: 3 Jun 2009 | 0:54:44 UTC - in response to Message 23998.

I'm working on getting the kahan summation working to see how much that improves the accuracy.

The general plan for milkyway_gpu is to have 2 types of applications:

1. single precision (probably using kahan summation in the kernel for highest accuracy) for GPUs that don't support double precision

2. double precision for GPUs that do support double precision

The server will have different validation for comparing floating point <-> floating point, floating point <-> double, double <-> double results, and we'll update our searches using the floating point values for exploration and double values for highly accurate exploitation.

I'm hoping to have the code updated tomorrow with double and single precision kernels (with the single precision kernel doing a kahan summation). I'll post the values I'm getting for the different streams made available in the code package and we can work from there.

After that the main priority will be getting workunits available on milkyway_gpu.
____________

trisf
Send message
Joined: 30 Nov 08
Posts: 11
Credit: 25,658
RAC: 0
Message 24210 - Posted: 4 Jun 2009 | 22:34:26 UTC
Last modified: 4 Jun 2009 | 22:52:23 UTC

CPU(intel core2 e6750) vs GPU(nv geforce 9600gt) computation times linux_x86_64 on stripe-20. Could you explain this?

CPU:
real 4m15.913s
user 4m6.327s
sys 0m0.800s

GPU:
real 15m45.243s
user 15m30.878s
sys 0m0.484s

about makefile
what the difference?
48: LINUX_LDFLAGS_i686 = -L/usr/X11R6/lib -L/usr/local/lib
53: LINUX_LDFLAGS_x86_64 = -L/usr/local/lib

Profile speedimic
Avatar
Send message
Joined: 22 Feb 08
Posts: 260
Credit: 57,387,048
RAC: 0
Message 24314 - Posted: 5 Jun 2009 | 23:04:39 UTC - in response to Message 24210.

looks like the cpu is faster...


CPU(intel core2 e6750) vs GPU(nv geforce 9600gt) computation times linux_x86_64 on stripe-20. Could you explain this?

CPU:
real 4m15.913s
user 4m6.327s
sys 0m0.800s

GPU:
real 15m45.243s
user 15m30.878s
sys 0m0.484s

about makefile
what the difference?
48: LINUX_LDFLAGS_i686 = -L/usr/X11R6/lib -L/usr/local/lib
53: LINUX_LDFLAGS_x86_64 = -L/usr/local/lib


____________
mic.


Cluster Physik
Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 24325 - Posted: 6 Jun 2009 | 0:00:30 UTC - in response to Message 24314.

looks like the cpu is faster...

Or the GPU calculates quite a bit more ;)

Profile speedimic
Avatar
Send message
Joined: 22 Feb 08
Posts: 260
Credit: 57,387,048
RAC: 0
Message 24391 - Posted: 6 Jun 2009 | 17:58:59 UTC - in response to Message 24325.

looks like the cpu is faster...

Or the GPU calculates quite a bit more ;)


;))
____________
mic.


Profile The Gas Giant
Avatar
Send message
Joined: 24 Dec 07
Posts: 1947
Credit: 240,865,573
RAC: 0
Message 24441 - Posted: 7 Jun 2009 | 5:18:51 UTC - in response to Message 24391.

looks like the cpu is faster...

Or the GPU calculates quite a bit more ;)


;))

As long as the credit is appropriate ;)

Cluster Physik
Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 24691 - Posted: 9 Jun 2009 | 16:58:28 UTC - in response to Message 24391.
Last modified: 9 Jun 2009 | 17:02:04 UTC

looks like the cpu is faster...

Or the GPU calculates quite a bit more ;)


;))

I guess the real reason is that the test units are much smaller than the real production units and the performance is severly limited by the administrative overhead. The integrals of current production WUs are a factor of 16 larger than those of the test WUs. And I don't know what the CPU has actually done to arrive at the numbers posted by trisf. Be assured that even mainstream GPUs (let alone high end ones) will be faster than the CPU.

trisf
Send message
Joined: 30 Nov 08
Posts: 11
Credit: 25,658
RAC: 0
Message 25124 - Posted: 12 Jun 2009 | 7:04:05 UTC

How to make GPU CUDA static binary?

dynamic: ok
static: /usr/bin/ld: cannot find -lcudart

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 25591 - Posted: 15 Jun 2009 | 22:59:31 UTC - in response to Message 25124.

How to make GPU CUDA static binary?

dynamic: ok
static: /usr/bin/ld: cannot find -lcudart


You need to have the CUDA compiler (nvcc), runtime and drivers installed to be able to compile the application.
____________

Cluster Physik
Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 26112 - Posted: 21 Jun 2009 | 4:00:39 UTC - in response to Message 23998.
Last modified: 21 Jun 2009 | 4:24:58 UTC

initial likelihood: -2.98530684176687044484

Sweet. Looks like you're getting what I'm getting. Is that for stripe 20?

If that is okay, I guess my single precision ATI implementation will do it, too.

I'm getting a fitness of -2.985312812926748 for the stripe20 test unit.
As a reference point, the stock CPU app (using a complete DP calculation) arrives at a value of -2.985312797571472.

So it appears my approach ist actually a bit (two digits, i.e. 6 bits) more precise :D

In the moment I'm using the integration layout as mentioned in the other thread (mu-r plane) and doing all summations with the Kahan method. This includes the convolution loop and the summation of all the values between different mu-r planes as well as the final reduction (done on GPU in SP as a treelike Kahan sum). That way I have to transfer virtually nothing (16 bytes or so for the whole integral) back from the GPU to the CPU. As it appears to me, it is unnecessary to do the reduction on the CPU. But I have to mention, that I do all CPU operations (including the likelihood compution) in DP. I have to test, if one looses the precision there (or have you tried it already, Travis?).


I have an update to this. Sorry for putting it here and not in the other thread, but the comparison values posted by trisf are here.

I do now the likelihood computation also on the GPU. As one does it some hundred times with the GPU project code (opposed to only once at the legacy version), it is definitely faster than the CPU. And I can now clearly state that the likelihood computation is not hurting the precision. Tuned it a bit and I'm getting now an initial likelihood value of -2.985312794995786 (double precision CPU arrives at -2.985312797571472). Not that bad for single precision I would say :D

The complete output file (sorry for the font size ;):

hessian [14 x 14]:
-390.01297330587551000 19.52481024081187000 -1533.2734804029969000 0.36626257582383914000 0.02027267242965535500 -0.05197370311904592200 -0.03842967610800939600 0.69457634088720965000 1579.36089206778270000 0.17400710502120850000 0.47681164572210827000 -0.03224550256438382000 -0.38525987955395630000 -2.27792784635028060000
19.52481024081187000 -0.89935716859890202000 79.88489592047896800000 -0.00695203154303195800 -0.00148870499261377610 -0.00311565588143973040 -0.00698965191281430890 -0.06931503981899565800 -77.84448502468065100000 0.01937431696556283400 0.03816721244609410500 0.00344173763563067730 -0.01042395336714463200 0.23676720306564644000
-1533.27348040299690000 79.88489592047896800000 -6506.29849996420310000 -10.64588417420964100000 0.71608552421054117000 -0.34682442103436034000 -0.69671490798839375000 -10.06847383244746700000 6213.89129040750320000 1.37108842797791410000 0.75474071437042756000 0.17694549529304973000 0.88234974882084305000 -405.46663249152459000
0.36626257582383914000 -0.00695203154303195800 -10.64588417420964100000 -0.35369263073903312000 0.14107354173731321000 0.13820025371005487000 0.23365800035553733000 0.46932180364223086000 8.19744272462230580000 -0.03979594431768873600 -0.02920173362378856000 0.01743864312212887700 0.00161407924063420640 -0.14854321476557666000
0.02027267242965535500 -0.00148870499261377610 0.71608552421054117000 0.14107354173731321000 -0.16454788920317040000 0.01043720665450109500 0.01784725145448362200 -0.13678780330650397000 -6.07685013420677840000 0.00068149189994907511 0.01583087827494722100 -0.01504449342881741600 -0.03803707349092632500 0.32327057697401068000
-0.05197370311904592200 -0.00311565588143973040 -0.34682442103436034000 0.13820025371005487000 0.01043720665450109500 -0.08778481028512727400 0.03903026050503891100 0.09120667184466431400 0.24538889438948294000 0.02641849701963868200 -0.00499808527898437570 -0.01218546868347263800 0.00932971292814480800 -0.03992223218673984800
-0.03842967610800939600 -0.00698965191281430890 -0.69671490798839375000 0.23365800035553733000 0.01784725145448362200 0.03903026050503891100 -0.06100904503814063400 0.21312882014790088000 -0.70542460761657810000 -0.02401597439434984000 0.01272565386400969900 0.01001337901485044100 -0.01750821709833871500 0.00449987269668383670
0.69457634088720965000 -0.06931503981899565800 -10.06847383244746700000 0.46932180364223086000 -0.13678780330650397000 0.09120667184466431400 0.21312882014790088000 -12.38068675357695300000 10.24474949318232600000 -0.18561448674366451000 -0.34162186968167413000 0.08297251774536107400 -0.04154662724964452300 0.52725879218229466000
1579.36089206778270000000 -77.84448502468065100000 6213.89129040750320000000 8.19744272462230580000 -6.07685013420677840000 0.24538889438948294000 -0.70542460761657810000 10.24474949318232600000 -6570.86651756344510000000 -1.84772567616657100000 -0.98881736132483400000 -2.05113148687985360000 -0.64424021672948573000 -423.39445838202039000000
0.17400710502120850000 0.01937431696556283400 1.37108842797791410000 -0.03979594431768873600 0.00068149189994907511 0.02641849701963868200 -0.02401597439434984000 -0.18561448674366451000 -1.84772567616657100000 -1.68289518123445860000 0.43500184935633485000 -0.83326524692574189000 0.08769457382484800700 0.43554789404727973000
0.47681164572210827000 0.03816721244609410500 0.75474071437042756000 -0.02920173362378856000 0.01583087827494722100 -0.00499808527898437570 0.01272565386400969900 -0.34162186968167413000 -0.98881736132483400000 0.43500184935633485000 -0.32984580344841413000 -0.03374971598487282200 0.00027394753132625732 10.65527388544040700000
-0.03224550256438382000 0.00344173763563067730 0.17694549529304973000 0.01743864312212887700 -0.01504449342881741600 -0.01218546868347263800 0.01001337901485044100 0.08297251774536107400 -2.05113148687985360000 -0.83326524692574189000 -0.03374971598487282200 0.03124775130005888100 -0.79111966977407622000 -6.42424817047052970000
-0.38525987955395630000 -0.01042395336714463200 0.88234974882084305000 0.00161407924063420640 -0.03803707349092632500 0.00932971292814480800 -0.01750821709833871500 -0.04154662724964452300 -0.64424021672948573000 0.08769457382484800700 0.00027394753132625732 -0.79111966977407622000 0.02997525838654979300 -10.10300454407086900000
-2.27792784635028060000 0.23676720306564644000 -405.46663249152459000000 -0.14854321476557666000 0.32327057697401068000 -0.03992223218673984800 0.00449987269668383670 0.52725879218229466000 -423.39445838202039000000 0.43554789404727973000 10.65527388544040700000 -6.42424817047052970000 -10.10300454407086900000 -98.01699035749678000000

gradient[14]: -0.002031011692160689, 3.870287978990916e-005, -0.00391933197008143, -4.165342145275493e-006, 3.936584391794895e-006, -4.114278547480884e-005, 3.338801457530849e-005, -3.847977492199561e-006, 0.005981095174689699, 6.980741170300082e-005, 4.713984758097922e-005, -2.486414777772931e-006, 4.213692172960747e-005, 0.0004124264263438704

initial_fitness: -2.98531279499578610000
inital_parameters[14]: 0.571713, 12.312119, -3.305187, 148.010257, 22.453902, 0.42035, -0.468858, 0.760579, -1.361644, 177.884238, 23.882892, 1.210639, -1.611974, 8.534378
result_fitness: -2.98531254274110090000
result_parameters[14]: 0.570715379831475, 12.31644736374576, -3.305076710884835, 148.0107118875137, 22.46190531351187, 0.4232579677285489, -0.4614999752737721, 0.7603950396852607, -1.361832050323567, 177.8835356891613, 23.88085791689922, 1.211781892132873, -1.610141258371338, 8.534315057698125
number_evaluations: 445
metadata: it: 5, ev: 588


By the way, that small test unit took about 4 minutes, i.e. half a second per evaluation ;)
But as it is a two stream WU with 80 x 800 x 350 spatial and 60 convolution steps, one can estimate the time for the bigger WUs (neglecting that the efficiency raises slightly with the size). Current production WUs have up to 320 x 1600 x 700 spatial and 120 convolution steps. That is a factor of 4 x 2 x 2 x 2 = 32 bigger. It would take about two hours total for about 445 * 3.6 TFlop ~ 1.6 Peta(!)Flop. That is about 220 GFlop/ second with an overclocked (10%) HD3870.

I should not start thinking about what the newer GPUs or even the next generation can do. I would be prepared for north of 600GFlops on a fast HD4800 :o
But the GT200 based nvidia cards have a chance to get close on single precision. That is a difference to the double precision performance.

Emanuel
Send message
Joined: 18 Nov 07
Posts: 280
Credit: 2,442,757
RAC: 539
Message 26153 - Posted: 21 Jun 2009 | 19:55:58 UTC - in response to Message 26112.

By the way, that small test unit took about 4 minutes, i.e. half a second per evaluation ;)
But as it is a two stream WU with 80 x 800 x 350 spatial and 60 convolution steps, one can estimate the time for the bigger WUs (neglecting that the efficiency raises slightly with the size). Current production WUs have up to 320 x 1600 x 700 spatial and 120 convolution steps. That is a factor of 4 x 2 x 2 x 2 = 32 bigger. It would take about two hours total

Could you give us a quick summary of what the GPUs are doing that the CPU code doesn't? It sounds like they're computing a great deal more!

Cluster Physik
Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 26159 - Posted: 21 Jun 2009 | 20:41:42 UTC - in response to Message 26153.

By the way, that small test unit took about 4 minutes, i.e. half a second per evaluation ;)
But as it is a two stream WU with 80 x 800 x 350 spatial and 60 convolution steps, one can estimate the time for the bigger WUs (neglecting that the efficiency raises slightly with the size). Current production WUs have up to 320 x 1600 x 700 spatial and 120 convolution steps. That is a factor of 4 x 2 x 2 x 2 = 32 bigger. It would take about two hours total

Could you give us a quick summary of what the GPUs are doing that the CPU code doesn't? It sounds like they're computing a great deal more!

The CPU code is doing just one "evaluation". That means the server sends a bunch of parameters (a WU) for a small volume of the Milky Way and the CPU code checks how good these parameters fit with the reality, i.e. the observed stars in that region. The result (called "fitness" or "likelihood") is then send back. From all the results the server tries to determine (using different algorithms like genetic search [gs_ WUs] or particle search [ps_ WUs]) in what directions the parameters have to evolve to get a better fitness.

The difference with the GPU project is that not a sole set of parameters is checked, but more or less a region of parameter sets. That's why there are so many numbers in the result file of the GPU code posted above ;) In principle the scientific app takes over a small part of the search algorithm. It does not only one simple check, it looks around a bit to see in which direction it gets better. In case of the double stream WUs it means the app does 445 evaluations of different parameter sets instead of a single one. So it is really a great deal more work. In case of triple stream WUs it would be actually about 900 evaluations, as there are more parameter combinations possible (but only about 150 for single stream WUs).

Emanuel
Send message
Joined: 18 Nov 07
Posts: 280
Credit: 2,442,757
RAC: 539
Message 26176 - Posted: 22 Jun 2009 | 0:38:13 UTC - in response to Message 26159.

Thanks :) I'm looking forward to the GPU apps even more now!

Cluster Physik
Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 26257 - Posted: 22 Jun 2009 | 21:10:59 UTC - in response to Message 26112.

By the way, that small test unit took about 4 minutes, i.e. half a second per evaluation ;)
But as it is a two stream WU with 80 x 800 x 350 spatial and 60 convolution steps, one can estimate the time for the bigger WUs (neglecting that the efficiency raises slightly with the size). Current production WUs have up to 320 x 1600 x 700 spatial and 120 convolution steps. That is a factor of 4 x 2 x 2 x 2 = 32 bigger. It would take about two hours total for about 445 * 3.6 TFlop ~ 1.6 Peta(!)Flop. That is about 220 GFlop/ second with an overclocked (10%) HD3870.

Just done a real counting of all the operations on the GPU (one can neglect the preparation of lookup tables done on the CPU). One evaluation for the stripe20 test WU represents about 113.94 GFlop with the single precision code.

445 evaluations * 113.94 GFlop / 215.14 seconds = 235 GFlop/s for that HD3870@860MHz
(theoretical peak would be 550 GFlops, that of a 3GHz quad core only 96 GFlops).

I used quite simple counting rules: addition and multiplication count as 1 flop, a division (only real ones, not those transformed to a multiplication by the compiler) and square root count as 4 flops and pow/exp/log also as only 4 flops. This a small deviation from the "standard" practice (if such thing exists in this respect) to count 8 flops for the latter. The reason is that the GPUs have hardware support for these instructions (opposed to double precision) with a fourth to a fifth of the througput of the simple instructions (same throughput as with divisions or square roots).

Furthermore I was a bit conservative, as exp(x) for instance is executed as pow(2, x*log2(e)) and log(x) as log2(x)*log(2) but I counted that whole construct as 4 flops. The actual implementation is dependent on the exact GPU and may change in the future. Therefore I decided to count the instructions in the C code and not in the GPU assembly like for the double precision version. It results in a considerably lower flops count than for double precision, but this should be compensated by the far higher execution speed for single precision.

@Travis:
You will get a PM.

Post to thread

Message boards : Application Code Discussion : milkyway & milkywayGPU makefile


Main page · Your account · Message boards


Copyright © 2013 AstroInformatics Group