Message boards :
Application Code Discussion :
milkyway & milkywayGPU makefile
Message board moderation
Author | Message |
---|---|
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
Here's a thread for discussing (and improving) the makefile we're using for milkyway. The newest code release has a combined linux, osx and GPU makefile. I've tested it on OSX and it works fine -- unfortunately AFAIK there is no 64 bit or PPC version of CUDA for OSX so it will only compile an i686 binary. I don't have a linux machine with a GPU to test the makefile, so let me know if the makefile works for those (I'm pretty sure it should as it's shouldn't be doing anything different than OSX). |
Send message Joined: 9 Sep 07 Posts: 22 Credit: 320,035 RAC: 0 |
I have a 64 bit Linux machine with a CUDA capable card. Getting it to work with BOINC hasn't gone well. If you can give a general idea of what you'd like me to do, I'd be happy to get compiling ;) Kathryn :o) The BOINC FAQ Service The Unofficial BOINC Wiki The Trac System More BOINC information than you can shake a stick of RAM at. |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
I have a 64 bit Linux machine with a CUDA capable card. Getting it to work with BOINC hasn't gone well. Welp, your guess is as good as mine :) JK. Well, you'd need to download the cuda driver and toolkit: http://www.nvidia.com/object/cuda_get.html You can test and see if it works with the samples. After that, you should be able to just download the latest GPU code from: http://milkyway.cs.rpi.edu/milkyway/download/code_release/ After unzipping, you should be able to go to the /milkyway/bin/ directory and try running the makefile: make linux_x86_64_gpu You'll probably need to specify the right directories pointing to where you have boinc and cuda installed in the makefile. |
Send message Joined: 30 Nov 08 Posts: 11 Credit: 25,658 RAC: 0 |
Some questions about compilation with make linux_x86_64_gpu 1) what is "evaluator.h" in evaluation/simple_evaluator.c , searches/[hessian,line_search,gradient].c ? is it evaluation/simple_evaluator.h ? 2) evaluate function in searches/[gradient,hessian,regression,line_search].c no visible declaration. ../searches/hessian.c:127: error: 'evaluate' was not declared in this scope ../searches/hessian.c: In function 'void get_hessian(int, double*, double*, double**)': ../searches/hessian.c:188: error: 'evaluate' was not declared in this scope ../searches/hessian.c:196: error: 'evaluate' was not declared in this scope make: *** [../searches/hessian.o] Error 1 PS: sorry for english |
Send message Joined: 8 Nov 08 Posts: 178 Credit: 6,140,854 RAC: 0 |
I've tested it on OSX and it works fine -- unfortunately AFAIK there is no 64 bit or PPC version of CUDA for OSX so it will only compile an i686 binary. AFAIK, there is no CUDA library for OS X, period. Nvidia doesn't release separate drivers, so they have to work with Apple to get them into an OS update. With Apple pushing OpenCL though, they may have to go with OpenCL first. I don't know what Apple's schedule is on that. |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
I've tested it on OSX and it works fine -- unfortunately AFAIK there is no 64 bit or PPC version of CUDA for OSX so it will only compile an i686 binary. There's a 32 bit CUDA library for Intel macs. That's what I've been using. |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
Some questions about compilation with make linux_x86_64_gpu Looks like I missed yet another file :( I'll update the v0.05 release. *update* Ok it should be in there now. |
Send message Joined: 30 Nov 08 Posts: 11 Credit: 25,658 RAC: 0 |
bin/Makefile line 147 missing space $(OBJ_CXX) $(OBJ_CXXFLAGS) $(LDFLAGS_x86_64) -o milkywayGPU_$(APP_VERSION)_x86_64-pc-linux-gnu$(APP_OBJS) $(SEARCH_OBJS) $(UTIL_OBJS) $(GPU_APP_OBJS) -lboinc -lboinc_api -lcudart $(OBJ_CXX) $(OBJ_CXXFLAGS) $(LDFLAGS_x86_64) -o milkywayGPU_$(APP_VERSION)_x86_64-pc-linux-gnu $(APP_OBJS) $(SEARCH_OBJS) $(UTIL_OBJS) $(GPU_APP_OBJS) -lboinc -lboinc_api -lcudart |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
bin/Makefile line 147 Nice catch, it'll be in the next update. |
Send message Joined: 30 Nov 08 Posts: 11 Credit: 25,658 RAC: 0 |
linux_x86_64_gpu maybe its my problem linking libboinc_api.a errors without openssl just added -lssl to line 147 how to run test units with milkywayGPU_0.18_x86_64-pc-linux-gnu? update renamed *-20.txt to *txt executing... looks like it works |
Send message Joined: 30 Nov 08 Posts: 11 Credit: 25,658 RAC: 0 |
linux_x86_64_gpu out, sorry for huge post. initial likelihood: -2.98530684176687044484 point[14]: 0.57171300000000002672, 12.31211899999999914712, -3.30518700000000009709, 148.01025699999999574175, 22.45390199999999936153, 0.42035000000000000142, -0.46885799999999999699, 0.76057900000000000507, -1.36164400000000007651, 177.88423800000001051558, 23.88289199999999823376, 1.21063900000000002066, -1.61197400000000001796, 8.53437800000000024170 step[14]: 0.00000400000000000000, 0.00008000000000000001, 0.00000100000000000000, 0.00003000000000000000, 0.00004000000000000000, 0.00006000000000000000, 0.00004000000000000000, 0.00000400000000000000, 0.00000100000000000000, 0.00003000000000000000, 0.00004000000000000000, 0.00006000000000000000, 0.00004000000000000000, 0.00000400000000000000 hessian[0][0] = 882.45647594797912915965, (-2.98530682678311976019 - -2.98530684176687044484 - -2.98530684176687044484 + -2.98530680027340666882)/(4 * 0.00000400000000000000 * 0.00000400000000000000) hessian[0][1] = hessian[1][0] = -6.30326069117614817827, (-2.98530681295196531622 - -2.98530680373119539084 - -2.98530676800071281818 + -2.98530676684811657751)/(4 * 0.00000400000000000000 * 0.00008000000000000001) hessian[0][2] = hessian[2][0] = -792.40988770656883843913, (-2.98530683369869720423 - -2.98530681525715779756 - -2.98530677952667522490 + -2.98530677376369402154)/(4 * 0.00000400000000000000 * 0.00000100000000000000) hessian[0][3] = hessian[3][0] = -69.63602102357431533619, (-2.98530682908831224154 - -2.98530681756235027891 - -2.98530676800071281818 + -2.98530678990004094686)/(4 * 0.00000400000000000000 * 0.00003000000000000000) hessian[0][4] = hessian[4][0] = 54.02794739373106125413, (-2.98530681525715779756 - -2.98530682908831224154 - -2.98530679335782966888 + -2.98530677261109778087)/(4 * 0.00000400000000000000 * 0.00004000000000000000) hessian[0][5] = hessian[5][0] = 55.22856894035754748984, (-2.98530680488379163151 - -2.98530681525715779756 - -2.98530681525715779756 + -2.98530677261109778087)/(4 * 0.00000400000000000000 * 0.00006000000000000000) hessian[0][6] = hessian[6][0] = 21.61117881871454571296, (-2.98530682447792727885 - -2.98530684061427420417 - -2.98530678298446350283 + -2.98530678528965598417)/(4 * 0.00000400000000000000 * 0.00004000000000000000) hessian[0][7] = hessian[7][0] = -198.10247886553611351701, (-2.98530682793571600087 - -2.98530683139350472288 - -2.98530678298446350283 + -2.98530679912081087224)/(4 * 0.00000400000000000000 * 0.00000400000000000000) hessian[0][8] = hessian[8][0] = -1152.59621291663461306598, (-2.98530683024090848221 - -2.98530682102013855683 - -2.98530677261109778087 + -2.98530678183186726216)/(4 * 0.00000400000000000000 * 0.00000100000000000000) hessian[0][9] = hessian[9][0] = 36.01863252100656609400, (-2.98530681986754231616 - -2.98530683485129344490 - -2.98530678644225222484 + -2.98530678413705974350)/(4 * 0.00000400000000000000 * 0.00003000000000000000) hessian[0][10] = hessian[10][0] = 10.80558975630196805184, (-2.98530681525715779756 - -2.98530681871494651958 - -2.98530678759484846552 + -2.98530678413705974350)/(4 * 0.00000400000000000000 * 0.00004000000000000000) hessian[0][11] = hessian[11][0] = -22.81180013404456374815, (-2.98530680949417659420 - -2.98530678413705974350 - -2.98530681525715779756 + -2.98530681179936907554)/(4 * 0.00000400000000000000 * 0.00006000000000000000) hessian[0][12] = hessian[12][0] = 90.04657991473762024270, (-2.98530678874744470619 - -2.98530683830908216692 - -2.98530680834158035353 + -2.98530680027340666882)/(4 * 0.00000400000000000000 * 0.00004000000000000000) hessian[0][13] = hessian[13][0] = 108.05589062412579437478, (-2.98530683254610096355 - -2.98530684176687044484 - -2.98530678874744470619 + -2.98530679105263718753)/(4 * 0.00000400000000000000 * 0.00000400000000000000) hessian[1][1] = 2.70139735233931821412, (-2.98530683485129344490 - -2.98530684176687044484 - -2.98530684176687044484 + -2.98530677952667522490)/(4 * 0.00008000000000000001 * 0.00008000000000000001) hessian[1][2] = hessian[2][1] = 3.60186325210065616531, (-2.98530682678311976019 - -2.98530682102013855683 - -2.98530681410456155689 + -2.98530680718898411286)/(4 * 0.00008000000000000001 * 0.00000100000000000000) hessian[1][3] = hessian[3][1] = 3.36173894277536033925, (-2.98530682908831224154 - -2.98530684522465916686 - -2.98530680027340666882 + -2.98530678413705974350)/(4 * 0.00008000000000000001 * 0.00003000000000000000) hessian[1][4] = hessian[4][1] = -1.44074526614579290218, (-2.98530684868244788888 - -2.98530682102013855683 - -2.98530679681561839089 + -2.98530678759484846552)/(4 * 0.00008000000000000001 * 0.00004000000000000000) hessian[1][5] = hessian[5][1] = -0.42021737941174319708, (-2.98530682563052351952 - -2.98530680718898411286 - -2.98530678874744470619 + -2.98530677837407898423)/(4 * 0.00008000000000000001 * 0.00006000000000000000) hessian[1][6] = hessian[6][1] = 3.42176998541221477623, (-2.98530681295196531622 - -2.98530684983504412955 - -2.98530680142600290949 + -2.98530679451042590955)/(4 * 0.00008000000000000001 * 0.00004000000000000000) hessian[1][7] = hessian[7][1] = 0.90046581302516404133, (-2.98530683485129344490 - -2.98530683485129344490 - -2.98530679105263718753 + -2.98530678990004094686)/(4 * 0.00008000000000000001 * 0.00000400000000000000) hessian[1][8] = hessian[8][1] = 64.83353715003303818776, (-2.98530682102013855683 - -2.98530682908831224154 - -2.98530681986754231616 + -2.98530680718898411286)/(4 * 0.00008000000000000001 * 0.00000100000000000000) hessian[1][9] = hessian[9][1] = 0.24012417054741772016, (-2.98530683369869720423 - -2.98530683024090848221 - -2.98530680142600290949 + -2.98530679566302215022)/(4 * 0.00008000000000000001 * 0.00003000000000000000) hessian[1][10] = hessian[10][1] = 0.09004658130251640136, (-2.98530683369869720423 - -2.98530683485129344490 - -2.98530679335782966888 + -2.98530679335782966888)/(4 * 0.00008000000000000001 * 0.00004000000000000000) hessian[1][11] = hessian[11][1] = 0.42021735628209683222, (-2.98530683139350472288 - -2.98530681179936907554 - -2.98530682217273479750 + -2.98530679451042590955)/(4 * 0.00008000000000000001 * 0.00006000000000000000) hessian[1][12] = hessian[12][1] = 2.07107130056893806724, (-2.98530677261109778087 - -2.98530682563052351952 - -2.98530676915330905885 + -2.98530679566302215022)/(4 * 0.00008000000000000001 * 0.00004000000000000000) hessian[1][13] = hessian[13][1] = -3.60186325210065616531, (-2.98530683254610096355 - -2.98530683139350472288 - -2.98530678528965598417 + -2.98530678874744470619)/(4 * 0.00008000000000000001 * 0.00000400000000000000) hessian[2][2] = -1152.59635169451257752371, (-2.98530684868244788888 - -2.98530684176687044484 - -2.98530684176687044484 + -2.98530683946167840759)/(4 * 0.00000100000000000000 * 0.00000100000000000000) hessian[2][3] = hessian[3][2] = 96.04968302194076557043, (-2.98530682217273479750 - -2.98530684637725540753 - -2.98530682217273479750 + -2.98530683485129344490)/(4 * 0.00000100000000000000 * 0.00003000000000000000) hessian[2][4] = hessian[4][2] = -36.01863252100655898857, (-2.98530683715648592624 - -2.98530682102013855683 - -2.98530683254610096355 + -2.98530682217273479750)/(4 * 0.00000100000000000000 * 0.00004000000000000000) hessian[2][5] = hessian[5][2] = 28.81490786717695939956, (-2.98530682447792727885 - -2.98530681756235027891 - -2.98530681756235027891 + -2.98530680373119539084)/(4 * 0.00000100000000000000 * 0.00006000000000000000) hessian[2][6] = hessian[6][2] = 0.00000000000000000000, (-2.98530682678311976019 - -2.98530684061427420417 - -2.98530681410456155689 + -2.98530682793571600087)/(4 * 0.00000100000000000000 * 0.00004000000000000000) hessian[2][7] = hessian[7][2] = 288.14903241247691312310, (-2.98530683715648592624 - -2.98530683139350472288 - -2.98530684291946668552 + -2.98530683254610096355)/(4 * 0.00000100000000000000 * 0.00000400000000000000) hessian[2][8] = hessian[8][2] = 3169.63955082627535375650, (-2.98530683254610096355 - -2.98530684868244788888 - -2.98530683369869720423 + -2.98530683715648592624)/(4 * 0.00000100000000000000 * 0.00000100000000000000) hessian[2][9] = hessian[9][2] = 19.20993364379341983295, (-2.98530683946167840759 - -2.98530683139350472288 - -2.98530682908831224154 + -2.98530681871494651958)/(4 * 0.00000100000000000000 * 0.00003000000000000000) hessian[2][10] = hessian[10][2] = -93.64844177905949607066, (-2.98530684868244788888 - -2.98530680488379163151 - -2.98530683946167840759 + -2.98530681064677283487)/(4 * 0.00000100000000000000 * 0.00004000000000000000) hessian[2][11] = hessian[11][2] = 0.00000185037170770859, (-2.98530682102013855683 - -2.98530679451042590955 - -2.98530681525715779756 + -2.98530678874744470619)/(4 * 0.00000100000000000000 * 0.00006000000000000000) hessian[2][12] = hessian[12][2] = 7.20372650420131233062, (-2.98530676454292409616 - -2.98530681295196531622 - -2.98530676339032785549 + -2.98530681064677283487)/(4 * 0.00000100000000000000 * 0.00004000000000000000) hessian[2][13] = hessian[13][2] = -432.22359025207879312802, (-2.98530683715648592624 - -2.98530682678311976019 - -2.98530682678311976019 + -2.98530682332533103818)/(4 * 0.00000100000000000000 * 0.00000400000000000000) hessian[3][3] = 3.52182172314030594862, (-2.98530682793571600087 - -2.98530684176687044484 - -2.98530684176687044484 + -2.98530684291946668552)/(4 * 0.00003000000000000000 * 0.00003000000000000000) hessian[3][4] = hessian[4][3] = -4.56235993429032671287, (-2.98530684176687044484 - -2.98530681871494651958 - -2.98530684522465916686 + -2.98530684407206292619)/(4 * 0.00003000000000000000 * 0.00004000000000000000) hessian[3][5] = hessian[5][3] = 1.28066248963578899200, (-2.98530682332533103818 - -2.98530681986754231616 - -2.98530683369869720423 + -2.98530682102013855683)/(4 * 0.00003000000000000000 * 0.00006000000000000000) hessian[3][6] = hessian[6][3] = -1.20062099151496659566, (-2.98530684637725540753 - -2.98530685098764037022 - -2.98530683254610096355 + -2.98530684291946668552)/(4 * 0.00003000000000000000 * 0.00004000000000000000) hessian[3][7] = hessian[7][3] = -7.20372650420131321880, (-2.98530683254610096355 - -2.98530682793571600087 - -2.98530684637725540753 + -2.98530684522465916686)/(4 * 0.00003000000000000000 * 0.00000400000000000000) hessian[3][8] = hessian[8][3] = -67.23477700513551269523, (-2.98530684407206292619 - -2.98530682332533103818 - -2.98530685444542909224 + -2.98530684176687044484)/(4 * 0.00003000000000000000 * 0.00000100000000000000) hessian[3][9] = hessian[9][3] = 1.92099373445368359903, (-2.98530683369869720423 - -2.98530683024090848221 - -2.98530685214023661089 + -2.98530684176687044484)/(4 * 0.00003000000000000000 * 0.00003000000000000000) hessian[3][10] = hessian[10][3] = -1.44074520832167696227, (-2.98530685098764037022 - -2.98530681756235027891 - -2.98530685329283285157 + -2.98530682678311976019)/(4 * 0.00003000000000000000 * 0.00004000000000000000) hessian[3][11] = hessian[11][3] = -0.32016562240894724800, (-2.98530682908831224154 - -2.98530681525715779756 - -2.98530682102013855683 + -2.98530680949417659420)/(4 * 0.00003000000000000000 * 0.00006000000000000000) hessian[3][12] = hessian[12][3] = -0.48024834109483544031, (-2.98530678298446350283 - -2.98530683139350472288 - -2.98530677952667522490 + -2.98530683024090848221)/(4 * 0.00003000000000000000 * 0.00004000000000000000) hessian[3][13] = hessian[13][3] = 0.00000000000000000000, (-2.98530683254610096355 - -2.98530683024090848221 - -2.98530684868244788888 + -2.98530684637725540753)/(4 * 0.00003000000000000000 * 0.00000400000000000000) hessian[4][4] = 1.08055890624125772170, (-2.98530683600388968557 - -2.98530684176687044484 - -2.98530684176687044484 + -2.98530684061427420417)/(4 * 0.00004000000000000000 * 0.00004000000000000000) hessian[4][5] = hessian[5][4] = -2.64136629235522901737, (-2.98530684061427420417 - -2.98530681986754231616 - -2.98530681756235027891 + -2.98530682217273479750)/(4 * 0.00004000000000000000 * 0.00006000000000000000) hessian[4][6] = hessian[6][4] = 1.62083853283423429126, (-2.98530684176687044484 - -2.98530685905581405493 - -2.98530683715648592624 + -2.98530684407206292619)/(4 * 0.00004000000000000000 * 0.00004000000000000000) hessian[4][7] = hessian[7][4] = -1.80093162605032808266, (-2.98530685444542909224 - -2.98530685214023661089 - -2.98530683715648592624 + -2.98530683600388968557)/(4 * 0.00004000000000000000 * 0.00000400000000000000) hessian[4][8] = hessian[8][4] = 7.20372927975887389351, (-2.98530684291946668552 - -2.98530683715648592624 - -2.98530683600388968557 + -2.98530682908831224154)/(4 * 0.00004000000000000000 * 0.00000100000000000000) hessian[4][9] = hessian[9][4] = 5.28273267722904371624, (-2.98530684983504412955 - -2.98530684637725540753 - -2.98530683830908216692 + -2.98530680949417659420)/(4 * 0.00004000000000000000 * 0.00003000000000000000) hessian[4][10] = hessian[10][4] = 6.30326055239827010013, (-2.98530684061427420417 - -2.98530683139350472288 - -2.98530684407206292619 + -2.98530679451042590955)/(4 * 0.00004000000000000000 * 0.00004000000000000000) hessian[4][11] = hessian[11][4] = -2.88149055542123200269, (-2.98530682908831224154 - -2.98530680142600290949 - -2.98530681756235027891 + -2.98530681756235027891)/(4 * 0.00004000000000000000 * 0.00006000000000000000) hessian[4][12] = hessian[12][4] = 0.18009316260503280271, (-2.98530678644225222484 - -2.98530683254610096355 - -2.98530678528965598417 + -2.98530683024090848221)/(4 * 0.00004000000000000000 * 0.00004000000000000000) hessian[4][13] = hessian[13][4] = 5.40279487815098402592, (-2.98530683830908216692 - -2.98530683715648592624 - -2.98530683024090848221 + -2.98530682563052351952)/(4 * 0.00004000000000000000 * 0.00000400000000000000) hessian[5][5] = 3.92202875115148996699, (-2.98530683254610096355 - -2.98530684176687044484 - -2.98530684176687044484 + -2.98530679451042590955)/(4 * 0.00006000000000000000 * 0.00006000000000000000) hessian[5][6] = hessian[6][5] = -7.32378842756749648402, (-2.98530684291946668552 - -2.98530684176687044484 - -2.98530677145850154020 + -2.98530684061427420417)/(4 * 0.00006000000000000000 * 0.00004000000000000000) hessian[5][7] = hessian[7][5] = 7.20372696679423984989, (-2.98530682793571600087 - -2.98530683946167840759 - -2.98530681756235027891 + -2.98530682217273479750)/(4 * 0.00006000000000000000 * 0.00000400000000000000) hessian[5][8] = hessian[8][5] = 33.61738850256775634762, (-2.98530683715648592624 - -2.98530682447792727885 - -2.98530683600388968557 + -2.98530681525715779756)/(4 * 0.00006000000000000000 * 0.00000100000000000000) hessian[5][9] = hessian[9][5] = 4.16215302963725708452, (-2.98530682793571600087 - -2.98530683830908216692 - -2.98530684061427420417 + -2.98530682102013855683)/(4 * 0.00006000000000000000 * 0.00003000000000000000) hessian[5][10] = hessian[10][5] = 1.08055892937090414208, (-2.98530683139350472288 - -2.98530682217273479750 - -2.98530683254610096355 + -2.98530681295196531622)/(4 * 0.00006000000000000000 * 0.00004000000000000000) hessian[5][11] = hessian[11][5] = 0.80041402518283966128, (-2.98530681640975403823 - -2.98530680142600290949 - -2.98530680718898411286 + -2.98530678067927102148)/(4 * 0.00006000000000000000 * 0.00006000000000000000) hessian[5][12] = hessian[12][5] = -0.72037265042013121086, (-2.98530679220523342821 - -2.98530680949417659420 - -2.98530678644225222484 + -2.98530681064677283487)/(4 * 0.00006000000000000000 * 0.00004000000000000000) hessian[5][13] = hessian[13][5] = 1.20062108403355227715, (-2.98530683830908216692 - -2.98530683369869720423 - -2.98530682563052351952 + -2.98530681986754231616)/(4 * 0.00006000000000000000 * 0.00000400000000000000) hessian[6][6] = 1.08055883685231868263, (-2.98530683830908216692 - -2.98530684176687044484 - -2.98530684176687044484 + -2.98530683830908216692)/(4 * 0.00004000000000000000 * 0.00004000000000000000) hessian[6][7] = hessian[7][6] = -12.60652068846290596582, (-2.98530684176687044484 - -2.98530683600388968557 - -2.98530685214023661089 + -2.98530685444542909224)/(4 * 0.00004000000000000000 * 0.00000400000000000000) hessian[6][8] = hessian[8][6] = -108.05589756301968407115, (-2.98530685329283285157 - -2.98530682102013855683 - -2.98530684752985164820 + -2.98530683254610096355)/(4 * 0.00004000000000000000 * 0.00000100000000000000) hessian[6][9] = hessian[9][6] = -2.88149050916193960603, (-2.98530685329283285157 - -2.98530683369869720423 - -2.98530685098764037022 + -2.98530684522465916686)/(4 * 0.00004000000000000000 * 0.00003000000000000000) hessian[6][10] = hessian[10][6] = -3.24167685750165146530, (-2.98530683600388968557 - -2.98530681525715779756 - -2.98530684291946668552 + -2.98530684291946668552)/(4 * 0.00004000000000000000 * 0.00004000000000000000) hessian[6][11] = hessian[11][6] = 4.08211154693619882039, (-2.98530683139350472288 - -2.98530682102013855683 - -2.98530686481879481420 + -2.98530681525715779756)/(4 * 0.00004000000000000000 * 0.00006000000000000000) hessian[6][12] = hessian[12][6] = 1.08055897563019676078, (-2.98530680027340666882 - -2.98530684522465916686 - -2.98530680373119539084 + -2.98530684176687044484)/(4 * 0.00004000000000000000 * 0.00004000000000000000) hessian[6][13] = hessian[13][6] = 14.40745231451323427052, (-2.98530683600388968557 - -2.98530684637725540753 - -2.98530683830908216692 + -2.98530683946167840759)/(4 * 0.00004000000000000000 * 0.00000400000000000000) hessian[7][7] = 36.01862558211266218677, (-2.98530684176687044484 - -2.98530684176687044484 - -2.98530684176687044484 + -2.98530683946167840759)/(4 * 0.00000400000000000000 * 0.00000400000000000000) hessian[7][8] = hessian[8][7] = -648.33538537811818969203, (-2.98530684868244788888 - -2.98530683369869720423 - -2.98530684291946668552 + -2.98530683830908216692)/(4 * 0.00000400000000000000 * 0.00000100000000000000) hessian[7][9] = hessian[9][7] = -2.40124216806710455430, (-2.98530684752985164820 - -2.98530683715648592624 - -2.98530684522465916686 + -2.98530683600388968557)/(4 * 0.00000400000000000000 * 0.00003000000000000000) hessian[7][10] = hessian[10][7] = -45.02328926347941973063, (-2.98530684868244788888 - -2.98530681179936907554 - -2.98530683600388968557 + -2.98530682793571600087)/(4 * 0.00000400000000000000 * 0.00004000000000000000) hessian[7][11] = hessian[11][7] = -25.21304230211167052289, (-2.98530683946167840759 - -2.98530680488379163151 - -2.98530681640975403823 + -2.98530680603638787218)/(4 * 0.00000400000000000000 * 0.00006000000000000000) hessian[7][12] = hessian[12][7] = -19.81024719266421740826, (-2.98530679220523342821 - -2.98530683139350472288 - -2.98530677722148274356 + -2.98530682908831224154)/(4 * 0.00000400000000000000 * 0.00004000000000000000) hessian[7][13] = hessian[13][7] = -36.01863252100656609400, (-2.98530684637725540753 - -2.98530683946167840759 - -2.98530684061427420417 + -2.98530683600388968557)/(4 * 0.00000400000000000000 * 0.00000400000000000000) hessian[8][8] = 6051.13004148449817876099, (-2.98530682332533103818 - -2.98530684176687044484 - -2.98530684176687044484 + -2.98530683600388968557)/(4 * 0.00000100000000000000 * 0.00000100000000000000) hessian[8][9] = hessian[9][8] = -76.83974937814734573749, (-2.98530685559802533291 - -2.98530684061427420417 - -2.98530683485129344490 + -2.98530682908831224154)/(4 * 0.00000100000000000000 * 0.00003000000000000000) hessian[8][10] = hessian[10][8] = 201.70433656652161857892, (-2.98530685559802533291 - -2.98530684868244788888 - -2.98530684868244788888 + -2.98530680949417659420)/(4 * 0.00000100000000000000 * 0.00004000000000000000) hessian[8][11] = hessian[11][8] = -24.01241983029933635407, (-2.98530682678311976019 - -2.98530681064677283487 - -2.98530681295196531622 + -2.98530680257859915017)/(4 * 0.00000100000000000000 * 0.00006000000000000000) hessian[8][12] = hessian[12][8] = 151.27825103711242604732, (-2.98530678874744470619 - -2.98530684061427420417 - -2.98530676800071281818 + -2.98530679566302215022)/(4 * 0.00000100000000000000 * 0.00004000000000000000) hessian[8][13] = hessian[13][8] = -216.11179512603939656401, (-2.98530684752985164820 - -2.98530683369869720423 - -2.98530684061427420417 + -2.98530683024090848221)/(4 * 0.00000100000000000000 * 0.00000400000000000000) hessian[9][9] = 1.28066236627767526812, (-2.98530685098764037022 - -2.98530684176687044484 - -2.98530684176687044484 + -2.98530682793571600087)/(4 * 0.00003000000000000000 * 0.00003000000000000000) hessian[9][10] = hessian[10][9] = -1.68086951764697278833, (-2.98530685675062157358 - -2.98530683024090848221 - -2.98530684061427420417 + -2.98530682217273479750)/(4 * 0.00003000000000000000 * 0.00004000000000000000) hessian[9][11] = hessian[11][9] = 2.08107648397910027782, (-2.98530682908831224154 - -2.98530680373119539084 - -2.98530682908831224154 + -2.98530678874744470619)/(4 * 0.00003000000000000000 * 0.00006000000000000000) hessian[9][12] = hessian[12][9] = -6.96360219487601650457, (-2.98530678874744470619 - -2.98530682447792727885 - -2.98530676800071281818 + -2.98530683715648592624)/(4 * 0.00003000000000000000 * 0.00004000000000000000) hessian[9][13] = hessian[13][9] = 14.40745300840262643760, (-2.98530684752985164820 - -2.98530683369869720423 - -2.98530684868244788888 + -2.98530682793571600087)/(4 * 0.00003000000000000000 * 0.00000400000000000000) hessian[10][10] = 5.76298106458317160872, (-2.98530684061427420417 - -2.98530684176687044484 - -2.98530684176687044484 + -2.98530680603638787218)/(4 * 0.00004000000000000000 * 0.00004000000000000000) hessian[10][11] = hessian[11][10] = 0.24012421680671039437, (-2.98530680142600290949 - -2.98530678990004094686 - -2.98530683024090848221 + -2.98530681640975403823)/(4 * 0.00004000000000000000 * 0.00006000000000000000) hessian[10][12] = hessian[12][10] = 2.88149053229158580436, (-2.98530679335782966888 - -2.98530684637725540753 - -2.98530677837407898423 + -2.98530681295196531622)/(4 * 0.00004000000000000000 * 0.00004000000000000000) hessian[10][13] = hessian[13][10] = -10.80558975630196805184, (-2.98530685790321781425 - -2.98530684983504412955 - -2.98530681871494651958 + -2.98530681756235027891)/(4 * 0.00004000000000000000 * 0.00000400000000000000) hessian[11][11] = 3.28169753717312406849, (-2.98530682678311976019 - -2.98530684176687044484 - -2.98530684176687044484 + -2.98530680949417659420)/(4 * 0.00006000000000000000 * 0.00006000000000000000) hessian[11][12] = hessian[12][11] = 2.28117996714516335643, (-2.98530679335782966888 - -2.98530682217273479750 - -2.98530677145850154020 + -2.98530677837407898423)/(4 * 0.00006000000000000000 * 0.00004000000000000000) hessian[11][13] = hessian[13][11] = -2.40124216806710455430, (-2.98530682447792727885 - -2.98530682563052351952 - -2.98530680718898411286 + -2.98530681064677283487)/(4 * 0.00006000000000000000 * 0.00000400000000000000) hessian[12][12] = 20.35052688864613301689, (-2.98530677145850154020 - -2.98530684176687044484 - -2.98530684176687044484 + -2.98530678183186726216)/(4 * 0.00004000000000000000 * 0.00004000000000000000) hessian[12][13] = hessian[13][12] = -16.20838463445295474230, (-2.98530677145850154020 - -2.98530677030590529952 - -2.98530682102013855683 + -2.98530683024090848221)/(4 * 0.00004000000000000000 * 0.00000400000000000000) hessian[13][13] = 198.10247192664220960978, (-2.98530684407206292619 - -2.98530684176687044484 - -2.98530684176687044484 + -2.98530682678311976019)/(4 * 0.00000400000000000000 * 0.00000400000000000000) gradient[0]: -0.00633927915716370194, (-2.98530684061427420417 - -2.98530678990004094686)/(2 * 0.000004) gradient[1]: -0.00025933414860013215, (-2.98530683254610096355 - -2.98530679105263718753)/(2 * 0.000080) gradient[2]: -0.00576298120336105058, (-2.98530683946167840759 - -2.98530682793571600087)/(2 * 0.000001) gradient[3]: 0.00028814905276656572, (-2.98530683024090848221 - -2.98530684752985164820)/(2 * 0.000030) gradient[4]: -0.00011525961851610587, (-2.98530684176687044484 - -2.98530683254610096355)/(2 * 0.000040) gradient[5]: -0.00010565465539495258, (-2.98530683369869720423 - -2.98530682102013855683)/(2 * 0.000060) gradient[6]: 0.00011525961851610587, (-2.98530683715648592624 - -2.98530684637725540753)/(2 * 0.000040) gradient[7]: 0.00086444718050415759, (-2.98530684061427420417 - -2.98530684752985164820)/(2 * 0.000004) gradient[8]: -0.00518668286098034059, (-2.98530684291946668552 - -2.98530683254610096355)/(2 * 0.000001) gradient[9]: -0.00013446955401027102, (-2.98530684407206292619 - -2.98530683600388968557)/(2 * 0.000030) gradient[10]: -0.00037459377266735311, (-2.98530684983504412955 - -2.98530681986754231616)/(2 * 0.000040) gradient[11]: -0.00012486458903874600, (-2.98530682332533103818 - -2.98530680834158035353)/(2 * 0.000060) gradient[12]: 0.00048985339118345905, (-2.98530678528965598417 - -2.98530682447792727885)/(2 * 0.000040) gradient[13]: -0.00057629806482495383, (-2.98530684291946668552 - -2.98530683830908216692)/(2 * 0.000004) line search starting at fitness: -2.985306841766870 initial point: [14]: 0.57171300000000002672, 12.31211899999999914712, -3.30518700000000009709, 148.01025699999999574175, 22.45390199999999936153, 0.42035000000000000142, -0.46885799999999999699, 0.76057900000000000507, -1.36164400000000007651, 177.88423800000001051558, 23.88289199999999823376, 1.21063900000000002066, -1.61197400000000001796, 8.53437800000000024170 loop 1, evaluations: 1, step: 1.000000000000000, fitness: -2.985306747253981 loop 2, evaluations: 2, step: 2.000000000000000, fitness: -2.985306733422826 loop 2, evaluations: 3, step: 4.000000000000000, fitness: -2.985306850987640 loop 3, evaluations: 4, step: 1.785714283530075, fitness: -2.985306755322155 loop 3, evaluations: 5, step: 2.595721102268431, fitness: -2.985306818714947 loop 3, evaluations: 6, step: 2.061540494505907, fitness: -2.985306771458502 loop 3, evaluations: 7, step: 1.912425577977664, fitness: -2.985306748406577 loop 3, evaluations: 8, step: 1.972377618796003, fitness: -2.985306729965038 loop 3, evaluations: 9, step: 1.973523614802404, fitness: -2.985306733422826 |
Send message Joined: 30 Nov 08 Posts: 11 Credit: 25,658 RAC: 0 |
out file hessian [14 x 14]: 882.45647594797912915965 -6.30326069117614817827 -792.40988770656883843913 -69.63602102357431533619 54.02794739373106125413 55.22856894035754748984 21.61117881871454571296 -198.10247886553611351701 -1152.59621291663461306598 36.01863252100656609400 10.80558975630196805184 -22.81180013404456374815 90.04657991473762024270 108.05589062412579437478 -6.30326069117614817827 2.70139735233931821412 3.60186325210065616531 3.36173894277536033925 -1.44074526614579290218 -0.42021737941174319708 3.42176998541221477623 0.90046581302516404133 64.83353715003303818776 0.24012417054741772016 0.09004658130251640136 0.42021735628209683222 2.07107130056893806724 -3.60186325210065616531 -792.40988770656883843913 3.60186325210065616531 -1152.59635169451257752371 96.04968302194076557043 -36.01863252100655898857 28.81490786717695939956 0.00000000000000000000 288.14903241247691312310 3169.63955082627535375650 19.20993364379341983295 -93.64844177905949607066 0.00000185037170770859 7.20372650420131233062 -432.22359025207879312802 -69.63602102357431533619 3.36173894277536033925 96.04968302194076557043 3.52182172314030594862 -4.56235993429032671287 1.28066248963578899200 -1.20062099151496659566 -7.20372650420131321880 -67.23477700513551269523 1.92099373445368359903 -1.44074520832167696227 -0.32016562240894724800 -0.48024834109483544031 0.00000000000000000000 54.02794739373106125413 -1.44074526614579290218 -36.01863252100655898857 -4.56235993429032671287 1.08055890624125772170 -2.64136629235522901737 1.62083853283423429126 -1.80093162605032808266 7.20372927975887389351 5.28273267722904371624 6.30326055239827010013 -2.88149055542123200269 0.18009316260503280271 5.40279487815098402592 55.22856894035754748984 -0.42021737941174319708 28.81490786717695939956 1.28066248963578899200 -2.64136629235522901737 3.92202875115148996699 -7.32378842756749648402 7.20372696679423984989 33.61738850256775634762 4.16215302963725708452 1.08055892937090414208 0.80041402518283966128 -0.72037265042013121086 1.20062108403355227715 21.61117881871454571296 3.42176998541221477623 0.00000000000000000000 -1.20062099151496659566 1.62083853283423429126 -7.32378842756749648402 1.08055883685231868263 -12.60652068846290596582 -108.05589756301968407115 -2.88149050916193960603 -3.24167685750165146530 4.08211154693619882039 1.08055897563019676078 14.40745231451323427052 -198.10247886553611351701 0.90046581302516404133 288.14903241247691312310 -7.20372650420131321880 -1.80093162605032808266 7.20372696679423984989 -12.60652068846290596582 36.01862558211266218677 -648.33538537811818969203 -2.40124216806710455430 -45.02328926347941973063 -25.21304230211167052289 -19.81024719266421740826 -36.01863252100656609400 -1152.59621291663461306598 64.83353715003303818776 3169.63955082627535375650 -67.23477700513551269523 7.20372927975887389351 33.61738850256775634762 -108.05589756301968407115 -648.33538537811818969203 6051.13004148449817876099 -76.83974937814734573749 201.70433656652161857892 -24.01241983029933635407 151.27825103711242604732 -216.11179512603939656401 36.01863252100656609400 0.24012417054741772016 19.20993364379341983295 1.92099373445368359903 5.28273267722904371624 4.16215302963725708452 -2.88149050916193960603 -2.40124216806710455430 -76.83974937814734573749 1.28066236627767526812 -1.68086951764697278833 2.08107648397910027782 -6.96360219487601650457 14.40745300840262643760 10.80558975630196805184 0.09004658130251640136 -93.64844177905949607066 -1.44074520832167696227 6.30326055239827010013 1.08055892937090414208 -3.24167685750165146530 -45.02328926347941973063 201.70433656652161857892 -1.68086951764697278833 5.76298106458317160872 0.24012421680671039437 2.88149053229158580436 -10.80558975630196805184 -22.81180013404456374815 0.42021735628209683222 0.00000185037170770859 -0.32016562240894724800 -2.88149055542123200269 0.80041402518283966128 4.08211154693619882039 -25.21304230211167052289 -24.01241983029933635407 2.08107648397910027782 0.24012421680671039437 3.28169753717312406849 2.28117996714516335643 -2.40124216806710455430 90.04657991473762024270 2.07107130056893806724 7.20372650420131233062 -0.48024834109483544031 0.18009316260503280271 -0.72037265042013121086 1.08055897563019676078 -19.81024719266421740826 151.27825103711242604732 -6.96360219487601650457 2.88149053229158580436 2.28117996714516335643 20.35052688864613301689 -16.20838463445295474230 108.05589062412579437478 -3.60186325210065616531 -432.22359025207879312802 0.00000000000000000000 5.40279487815098402592 1.20062108403355227715 14.40745231451323427052 -36.01863252100656609400 -216.11179512603939656401 14.40745300840262643760 -10.80558975630196805184 -2.40124216806710455430 -16.20838463445295474230 198.10247192664220960978 gradient[14]: -0.00633927915716370194, -0.00025933414860013215, -0.00576298120336105058, 0.00028814905276656572, -0.00011525961851610587, -0.00010565465539495258, 0.00011525961851610587, 0.00086444718050415759, -0.00518668286098034059, -0.00013446955401027102, -0.00037459377266735311, -0.00012486458903874600, 0.00048985339118345905, -0.00057629806482495383 initial_fitness: -2.98530684176687044484 inital_parameters[14]: 0.57171300000000002672, 12.31211899999999914712, -3.30518700000000009709, 148.01025699999999574175, 22.45390199999999936153, 0.42035000000000000142, -0.46885799999999999699, 0.76057900000000000507, -1.36164400000000007651, 177.88423800000001051558, 23.88289199999999823376, 1.21063900000000002066, -1.61197400000000001796, 8.53437800000000024170 result_fitness: -2.98530673342282648619 result_parameters[14]: 0.57171023310301738451, 12.31159045077501801302, -3.30517495098843516743, 148.01059727942629251629, 22.45401028483425420745, 0.42010611491115568139, -0.46892261685864594645, 0.76055531511989238336, -1.36164776621840832860, 177.88429089670174221283, 23.88286053550067578044, 1.21053790484657564086, -1.61181317710548888122, 8.53439162571984688554 number_evaluations: 444 metadata: it: 5, ev: 588 |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
Sweet. Looks like you're getting what I'm getting. Is that for stripe 20? |
Send message Joined: 30 Nov 08 Posts: 11 Credit: 25,658 RAC: 0 |
yes for 20 |
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0 |
initial likelihood: -2.98530684176687044484 Sweet. Looks like you're getting what I'm getting. Is that for stripe 20? If that is okay, I guess my single precision ATI implementation will do it, too. I'm getting a fitness of -2.985312812926748 for the stripe20 test unit. As a reference point, the stock CPU app (using a complete DP calculation) arrives at a value of -2.985312797571472. So it appears my approach ist actually a bit (two digits, i.e. 6 bits) more precise :D In the moment I'm using the integration layout as mentioned in the other thread (mu-r plane) and doing all summations with the Kahan method. This includes the convolution loop and the summation of all the values between different mu-r planes as well as the final reduction (done on GPU in SP as a treelike Kahan sum). That way I have to transfer virtually nothing (16 bytes or so for the whole integral) back from the GPU to the CPU. As it appears to me, it is unnecessary to do the reduction on the CPU. But I have to mention, that I do all CPU operations (including the likelihood compution) in DP. I have to test, if one looses the precision there (or have you tried it already, Travis?). |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
I'm working on getting the kahan summation working to see how much that improves the accuracy. The general plan for milkyway_gpu is to have 2 types of applications: 1. single precision (probably using kahan summation in the kernel for highest accuracy) for GPUs that don't support double precision 2. double precision for GPUs that do support double precision The server will have different validation for comparing floating point <-> floating point, floating point <-> double, double <-> double results, and we'll update our searches using the floating point values for exploration and double values for highly accurate exploitation. I'm hoping to have the code updated tomorrow with double and single precision kernels (with the single precision kernel doing a kahan summation). I'll post the values I'm getting for the different streams made available in the code package and we can work from there. After that the main priority will be getting workunits available on milkyway_gpu. |
Send message Joined: 30 Nov 08 Posts: 11 Credit: 25,658 RAC: 0 |
CPU(intel core2 e6750) vs GPU(nv geforce 9600gt) computation times linux_x86_64 on stripe-20. Could you explain this? CPU: real 4m15.913s user 4m6.327s sys 0m0.800s GPU: real 15m45.243s user 15m30.878s sys 0m0.484s about makefile what the difference? 48: LINUX_LDFLAGS_i686 = -L/usr/X11R6/lib -L/usr/local/lib 53: LINUX_LDFLAGS_x86_64 = -L/usr/local/lib |
Send message Joined: 22 Feb 08 Posts: 260 Credit: 57,387,048 RAC: 0 |
looks like the cpu is faster... CPU(intel core2 e6750) vs GPU(nv geforce 9600gt) computation times linux_x86_64 on stripe-20. Could you explain this? mic. |
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0 |
looks like the cpu is faster... Or the GPU calculates quite a bit more ;) |
Send message Joined: 22 Feb 08 Posts: 260 Credit: 57,387,048 RAC: 0 |
looks like the cpu is faster... ;)) mic. |
©2024 Astroinformatics Group