| Author |
Message |
TravisVolunteer moderator Project administrator Project developer Project tester Project scientist Send message
Joined: 30 Aug 07 Posts: 1976 Credit: 26,480 RAC: 0
|
|
Here's a thread for discussing (and improving) the makefile we're using for milkyway. The newest code release has a combined linux, osx and GPU makefile. I've tested it on OSX and it works fine -- unfortunately AFAIK there is no 64 bit or PPC version of CUDA for OSX so it will only compile an i686 binary.
I don't have a linux machine with a GPU to test the makefile, so let me know if the makefile works for those (I'm pretty sure it should as it's shouldn't be doing anything different than OSX).
____________
|
|
|
|
|
|
I have a 64 bit Linux machine with a CUDA capable card. Getting it to work with BOINC hasn't gone well.
If you can give a general idea of what you'd like me to do, I'd be happy to get compiling ;)
____________
Kathryn :o)
The BOINC FAQ Service
The Unofficial BOINC Wiki
The Trac System
More BOINC information than you can shake a stick of RAM at. |
|
|
TravisVolunteer moderator Project administrator Project developer Project tester Project scientist Send message
Joined: 30 Aug 07 Posts: 1976 Credit: 26,480 RAC: 0
|
I have a 64 bit Linux machine with a CUDA capable card. Getting it to work with BOINC hasn't gone well.
If you can give a general idea of what you'd like me to do, I'd be happy to get compiling ;)
Welp, your guess is as good as mine :) JK.
Well, you'd need to download the cuda driver and toolkit: http://www.nvidia.com/object/cuda_get.html
You can test and see if it works with the samples.
After that, you should be able to just download the latest GPU code from: http://milkyway.cs.rpi.edu/milkyway/download/code_release/
After unzipping, you should be able to go to the /milkyway/bin/ directory and try running the makefile:
make linux_x86_64_gpu
You'll probably need to specify the right directories pointing to where you have boinc and cuda installed in the makefile.
____________
|
|
|
|
|
|
Some questions about compilation with make linux_x86_64_gpu
1) what is "evaluator.h" in evaluation/simple_evaluator.c , searches/[hessian,line_search,gradient].c ?
is it evaluation/simple_evaluator.h ?
2) evaluate function in searches/[gradient,hessian,regression,line_search].c no visible declaration.
../searches/hessian.c:127: error: 'evaluate' was not declared in this scope
../searches/hessian.c: In function 'void get_hessian(int, double*, double*, double**)':
../searches/hessian.c:188: error: 'evaluate' was not declared in this scope
../searches/hessian.c:196: error: 'evaluate' was not declared in this scope
make: *** [../searches/hessian.o] Error 1
PS: sorry for english |
|
|
|
|
I've tested it on OSX and it works fine -- unfortunately AFAIK there is no 64 bit or PPC version of CUDA for OSX so it will only compile an i686 binary.
AFAIK, there is no CUDA library for OS X, period. Nvidia doesn't release separate drivers, so they have to work with Apple to get them into an OS update. With Apple pushing OpenCL though, they may have to go with OpenCL first. I don't know what Apple's schedule is on that.
____________
|
|
|
TravisVolunteer moderator Project administrator Project developer Project tester Project scientist Send message
Joined: 30 Aug 07 Posts: 1976 Credit: 26,480 RAC: 0
|
I've tested it on OSX and it works fine -- unfortunately AFAIK there is no 64 bit or PPC version of CUDA for OSX so it will only compile an i686 binary.
AFAIK, there is no CUDA library for OS X, period. Nvidia doesn't release separate drivers, so they have to work with Apple to get them into an OS update. With Apple pushing OpenCL though, they may have to go with OpenCL first. I don't know what Apple's schedule is on that.
There's a 32 bit CUDA library for Intel macs. That's what I've been using.
____________
|
|
|
TravisVolunteer moderator Project administrator Project developer Project tester Project scientist Send message
Joined: 30 Aug 07 Posts: 1976 Credit: 26,480 RAC: 0
|
Some questions about compilation with make linux_x86_64_gpu
1) what is "evaluator.h" in evaluation/simple_evaluator.c , searches/[hessian,line_search,gradient].c ?
is it evaluation/simple_evaluator.h ?
2) evaluate function in searches/[gradient,hessian,regression,line_search].c no visible declaration.
../searches/hessian.c:127: error: 'evaluate' was not declared in this scope
../searches/hessian.c: In function 'void get_hessian(int, double*, double*, double**)':
../searches/hessian.c:188: error: 'evaluate' was not declared in this scope
../searches/hessian.c:196: error: 'evaluate' was not declared in this scope
make: *** [../searches/hessian.o] Error 1
PS: sorry for english
Looks like I missed yet another file :( I'll update the v0.05 release.
*update*
Ok it should be in there now.
____________
|
|
|
|
|
|
bin/Makefile line 147
missing space
$(OBJ_CXX) $(OBJ_CXXFLAGS) $(LDFLAGS_x86_64) -o milkywayGPU_$(APP_VERSION)_x86_64-pc-linux-gnu$(APP_OBJS) $(SEARCH_OBJS) $(UTIL_OBJS) $(GPU_APP_OBJS) -lboinc -lboinc_api -lcudart
$(OBJ_CXX) $(OBJ_CXXFLAGS) $(LDFLAGS_x86_64) -o milkywayGPU_$(APP_VERSION)_x86_64-pc-linux-gnu $(APP_OBJS) $(SEARCH_OBJS) $(UTIL_OBJS) $(GPU_APP_OBJS) -lboinc -lboinc_api -lcudart |
|
|
TravisVolunteer moderator Project administrator Project developer Project tester Project scientist Send message
Joined: 30 Aug 07 Posts: 1976 Credit: 26,480 RAC: 0
|
bin/Makefile line 147
missing space
$(OBJ_CXX) $(OBJ_CXXFLAGS) $(LDFLAGS_x86_64) -o milkywayGPU_$(APP_VERSION)_x86_64-pc-linux-gnu$(APP_OBJS) $(SEARCH_OBJS) $(UTIL_OBJS) $(GPU_APP_OBJS) -lboinc -lboinc_api -lcudart
$(OBJ_CXX) $(OBJ_CXXFLAGS) $(LDFLAGS_x86_64) -o milkywayGPU_$(APP_VERSION)_x86_64-pc-linux-gnu $(APP_OBJS) $(SEARCH_OBJS) $(UTIL_OBJS) $(GPU_APP_OBJS) -lboinc -lboinc_api -lcudart
Nice catch, it'll be in the next update.
____________
|
|
|
|
|
|
linux_x86_64_gpu
maybe its my problem
linking libboinc_api.a errors without openssl
just added -lssl to line 147
how to run test units with milkywayGPU_0.18_x86_64-pc-linux-gnu?
update
renamed *-20.txt to *txt
executing...
looks like it works |
|
|
|
|
|
linux_x86_64_gpu out, sorry for huge post.
initial likelihood: -2.98530684176687044484
point[14]: 0.57171300000000002672, 12.31211899999999914712, -3.30518700000000009709, 148.01025699999999574175, 22.45390199999999936153, 0.42035000000000000142, -0.46885799999999999699, 0.76057900000000000507, -1.36164400000000007651, 177.88423800000001051558, 23.88289199999999823376, 1.21063900000000002066, -1.61197400000000001796, 8.53437800000000024170
step[14]: 0.00000400000000000000, 0.00008000000000000001, 0.00000100000000000000, 0.00003000000000000000, 0.00004000000000000000, 0.00006000000000000000, 0.00004000000000000000, 0.00000400000000000000, 0.00000100000000000000, 0.00003000000000000000, 0.00004000000000000000, 0.00006000000000000000, 0.00004000000000000000, 0.00000400000000000000
hessian[0][0] = 882.45647594797912915965, (-2.98530682678311976019 - -2.98530684176687044484 - -2.98530684176687044484 + -2.98530680027340666882)/(4 * 0.00000400000000000000 * 0.00000400000000000000)
hessian[0][1] = hessian[1][0] = -6.30326069117614817827, (-2.98530681295196531622 - -2.98530680373119539084 - -2.98530676800071281818 + -2.98530676684811657751)/(4 * 0.00000400000000000000 * 0.00008000000000000001)
hessian[0][2] = hessian[2][0] = -792.40988770656883843913, (-2.98530683369869720423 - -2.98530681525715779756 - -2.98530677952667522490 + -2.98530677376369402154)/(4 * 0.00000400000000000000 * 0.00000100000000000000)
hessian[0][3] = hessian[3][0] = -69.63602102357431533619, (-2.98530682908831224154 - -2.98530681756235027891 - -2.98530676800071281818 + -2.98530678990004094686)/(4 * 0.00000400000000000000 * 0.00003000000000000000)
hessian[0][4] = hessian[4][0] = 54.02794739373106125413, (-2.98530681525715779756 - -2.98530682908831224154 - -2.98530679335782966888 + -2.98530677261109778087)/(4 * 0.00000400000000000000 * 0.00004000000000000000)
hessian[0][5] = hessian[5][0] = 55.22856894035754748984, (-2.98530680488379163151 - -2.98530681525715779756 - -2.98530681525715779756 + -2.98530677261109778087)/(4 * 0.00000400000000000000 * 0.00006000000000000000)
hessian[0][6] = hessian[6][0] = 21.61117881871454571296, (-2.98530682447792727885 - -2.98530684061427420417 - -2.98530678298446350283 + -2.98530678528965598417)/(4 * 0.00000400000000000000 * 0.00004000000000000000)
hessian[0][7] = hessian[7][0] = -198.10247886553611351701, (-2.98530682793571600087 - -2.98530683139350472288 - -2.98530678298446350283 + -2.98530679912081087224)/(4 * 0.00000400000000000000 * 0.00000400000000000000)
hessian[0][8] = hessian[8][0] = -1152.59621291663461306598, (-2.98530683024090848221 - -2.98530682102013855683 - -2.98530677261109778087 + -2.98530678183186726216)/(4 * 0.00000400000000000000 * 0.00000100000000000000)
hessian[0][9] = hessian[9][0] = 36.01863252100656609400, (-2.98530681986754231616 - -2.98530683485129344490 - -2.98530678644225222484 + -2.98530678413705974350)/(4 * 0.00000400000000000000 * 0.00003000000000000000)
hessian[0][10] = hessian[10][0] = 10.80558975630196805184, (-2.98530681525715779756 - -2.98530681871494651958 - -2.98530678759484846552 + -2.98530678413705974350)/(4 * 0.00000400000000000000 * 0.00004000000000000000)
hessian[0][11] = hessian[11][0] = -22.81180013404456374815, (-2.98530680949417659420 - -2.98530678413705974350 - -2.98530681525715779756 + -2.98530681179936907554)/(4 * 0.00000400000000000000 * 0.00006000000000000000)
hessian[0][12] = hessian[12][0] = 90.04657991473762024270, (-2.98530678874744470619 - -2.98530683830908216692 - -2.98530680834158035353 + -2.98530680027340666882)/(4 * 0.00000400000000000000 * 0.00004000000000000000)
hessian[0][13] = hessian[13][0] = 108.05589062412579437478, (-2.98530683254610096355 - -2.98530684176687044484 - -2.98530678874744470619 + -2.98530679105263718753)/(4 * 0.00000400000000000000 * 0.00000400000000000000)
hessian[1][1] = 2.70139735233931821412, (-2.98530683485129344490 - -2.98530684176687044484 - -2.98530684176687044484 + -2.98530677952667522490)/(4 * 0.00008000000000000001 * 0.00008000000000000001)
hessian[1][2] = hessian[2][1] = 3.60186325210065616531, (-2.98530682678311976019 - -2.98530682102013855683 - -2.98530681410456155689 + -2.98530680718898411286)/(4 * 0.00008000000000000001 * 0.00000100000000000000)
hessian[1][3] = hessian[3][1] = 3.36173894277536033925, (-2.98530682908831224154 - -2.98530684522465916686 - -2.98530680027340666882 + -2.98530678413705974350)/(4 * 0.00008000000000000001 * 0.00003000000000000000)
hessian[1][4] = hessian[4][1] = -1.44074526614579290218, (-2.98530684868244788888 - -2.98530682102013855683 - -2.98530679681561839089 + -2.98530678759484846552)/(4 * 0.00008000000000000001 * 0.00004000000000000000)
hessian[1][5] = hessian[5][1] = -0.42021737941174319708, (-2.98530682563052351952 - -2.98530680718898411286 - -2.98530678874744470619 + -2.98530677837407898423)/(4 * 0.00008000000000000001 * 0.00006000000000000000)
hessian[1][6] = hessian[6][1] = 3.42176998541221477623, (-2.98530681295196531622 - -2.98530684983504412955 - -2.98530680142600290949 + -2.98530679451042590955)/(4 * 0.00008000000000000001 * 0.00004000000000000000)
hessian[1][7] = hessian[7][1] = 0.90046581302516404133, (-2.98530683485129344490 - -2.98530683485129344490 - -2.98530679105263718753 + -2.98530678990004094686)/(4 * 0.00008000000000000001 * 0.00000400000000000000)
hessian[1][8] = hessian[8][1] = 64.83353715003303818776, (-2.98530682102013855683 - -2.98530682908831224154 - -2.98530681986754231616 + -2.98530680718898411286)/(4 * 0.00008000000000000001 * 0.00000100000000000000)
hessian[1][9] = hessian[9][1] = 0.24012417054741772016, (-2.98530683369869720423 - -2.98530683024090848221 - -2.98530680142600290949 + -2.98530679566302215022)/(4 * 0.00008000000000000001 * 0.00003000000000000000)
hessian[1][10] = hessian[10][1] = 0.09004658130251640136, (-2.98530683369869720423 - -2.98530683485129344490 - -2.98530679335782966888 + -2.98530679335782966888)/(4 * 0.00008000000000000001 * 0.00004000000000000000)
hessian[1][11] = hessian[11][1] = 0.42021735628209683222, (-2.98530683139350472288 - -2.98530681179936907554 - -2.98530682217273479750 + -2.98530679451042590955)/(4 * 0.00008000000000000001 * 0.00006000000000000000)
hessian[1][12] = hessian[12][1] = 2.07107130056893806724, (-2.98530677261109778087 - -2.98530682563052351952 - -2.98530676915330905885 + -2.98530679566302215022)/(4 * 0.00008000000000000001 * 0.00004000000000000000)
hessian[1][13] = hessian[13][1] = -3.60186325210065616531, (-2.98530683254610096355 - -2.98530683139350472288 - -2.98530678528965598417 + -2.98530678874744470619)/(4 * 0.00008000000000000001 * 0.00000400000000000000)
hessian[2][2] = -1152.59635169451257752371, (-2.98530684868244788888 - -2.98530684176687044484 - -2.98530684176687044484 + -2.98530683946167840759)/(4 * 0.00000100000000000000 * 0.00000100000000000000)
hessian[2][3] = hessian[3][2] = 96.04968302194076557043, (-2.98530682217273479750 - -2.98530684637725540753 - -2.98530682217273479750 + -2.98530683485129344490)/(4 * 0.00000100000000000000 * 0.00003000000000000000)
hessian[2][4] = hessian[4][2] = -36.01863252100655898857, (-2.98530683715648592624 - -2.98530682102013855683 - -2.98530683254610096355 + -2.98530682217273479750)/(4 * 0.00000100000000000000 * 0.00004000000000000000)
hessian[2][5] = hessian[5][2] = 28.81490786717695939956, (-2.98530682447792727885 - -2.98530681756235027891 - -2.98530681756235027891 + -2.98530680373119539084)/(4 * 0.00000100000000000000 * 0.00006000000000000000)
hessian[2][6] = hessian[6][2] = 0.00000000000000000000, (-2.98530682678311976019 - -2.98530684061427420417 - -2.98530681410456155689 + -2.98530682793571600087)/(4 * 0.00000100000000000000 * 0.00004000000000000000)
hessian[2][7] = hessian[7][2] = 288.14903241247691312310, (-2.98530683715648592624 - -2.98530683139350472288 - -2.98530684291946668552 + -2.98530683254610096355)/(4 * 0.00000100000000000000 * 0.00000400000000000000)
hessian[2][8] = hessian[8][2] = 3169.63955082627535375650, (-2.98530683254610096355 - -2.98530684868244788888 - -2.98530683369869720423 + -2.98530683715648592624)/(4 * 0.00000100000000000000 * 0.00000100000000000000)
hessian[2][9] = hessian[9][2] = 19.20993364379341983295, (-2.98530683946167840759 - -2.98530683139350472288 - -2.98530682908831224154 + -2.98530681871494651958)/(4 * 0.00000100000000000000 * 0.00003000000000000000)
hessian[2][10] = hessian[10][2] = -93.64844177905949607066, (-2.98530684868244788888 - -2.98530680488379163151 - -2.98530683946167840759 + -2.98530681064677283487)/(4 * 0.00000100000000000000 * 0.00004000000000000000)
hessian[2][11] = hessian[11][2] = 0.00000185037170770859, (-2.98530682102013855683 - -2.98530679451042590955 - -2.98530681525715779756 + -2.98530678874744470619)/(4 * 0.00000100000000000000 * 0.00006000000000000000)
hessian[2][12] = hessian[12][2] = 7.20372650420131233062, (-2.98530676454292409616 - -2.98530681295196531622 - -2.98530676339032785549 + -2.98530681064677283487)/(4 * 0.00000100000000000000 * 0.00004000000000000000)
hessian[2][13] = hessian[13][2] = -432.22359025207879312802, (-2.98530683715648592624 - -2.98530682678311976019 - -2.98530682678311976019 + -2.98530682332533103818)/(4 * 0.00000100000000000000 * 0.00000400000000000000)
hessian[3][3] = 3.52182172314030594862, (-2.98530682793571600087 - -2.98530684176687044484 - -2.98530684176687044484 + -2.98530684291946668552)/(4 * 0.00003000000000000000 * 0.00003000000000000000)
hessian[3][4] = hessian[4][3] = -4.56235993429032671287, (-2.98530684176687044484 - -2.98530681871494651958 - -2.98530684522465916686 + -2.98530684407206292619)/(4 * 0.00003000000000000000 * 0.00004000000000000000)
hessian[3][5] = hessian[5][3] = 1.28066248963578899200, (-2.98530682332533103818 - -2.98530681986754231616 - -2.98530683369869720423 + -2.98530682102013855683)/(4 * 0.00003000000000000000 * 0.00006000000000000000)
hessian[3][6] = hessian[6][3] = -1.20062099151496659566, (-2.98530684637725540753 - -2.98530685098764037022 - -2.98530683254610096355 + -2.98530684291946668552)/(4 * 0.00003000000000000000 * 0.00004000000000000000)
hessian[3][7] = hessian[7][3] = -7.20372650420131321880, (-2.98530683254610096355 - -2.98530682793571600087 - -2.98530684637725540753 + -2.98530684522465916686)/(4 * 0.00003000000000000000 * 0.00000400000000000000)
hessian[3][8] = hessian[8][3] = -67.23477700513551269523, (-2.98530684407206292619 - -2.98530682332533103818 - -2.98530685444542909224 + -2.98530684176687044484)/(4 * 0.00003000000000000000 * 0.00000100000000000000)
hessian[3][9] = hessian[9][3] = 1.92099373445368359903, (-2.98530683369869720423 - -2.98530683024090848221 - -2.98530685214023661089 + -2.98530684176687044484)/(4 * 0.00003000000000000000 * 0.00003000000000000000)
hessian[3][10] = hessian[10][3] = -1.44074520832167696227, (-2.98530685098764037022 - -2.98530681756235027891 - -2.98530685329283285157 + -2.98530682678311976019)/(4 * 0.00003000000000000000 * 0.00004000000000000000)
hessian[3][11] = hessian[11][3] = -0.32016562240894724800, (-2.98530682908831224154 - -2.98530681525715779756 - -2.98530682102013855683 + -2.98530680949417659420)/(4 * 0.00003000000000000000 * 0.00006000000000000000)
hessian[3][12] = hessian[12][3] = -0.48024834109483544031, (-2.98530678298446350283 - -2.98530683139350472288 - -2.98530677952667522490 + -2.98530683024090848221)/(4 * 0.00003000000000000000 * 0.00004000000000000000)
hessian[3][13] = hessian[13][3] = 0.00000000000000000000, (-2.98530683254610096355 - -2.98530683024090848221 - -2.98530684868244788888 + -2.98530684637725540753)/(4 * 0.00003000000000000000 * 0.00000400000000000000)
hessian[4][4] = 1.08055890624125772170, (-2.98530683600388968557 - -2.98530684176687044484 - -2.98530684176687044484 + -2.98530684061427420417)/(4 * 0.00004000000000000000 * 0.00004000000000000000)
hessian[4][5] = hessian[5][4] = -2.64136629235522901737, (-2.98530684061427420417 - -2.98530681986754231616 - -2.98530681756235027891 + -2.98530682217273479750)/(4 * 0.00004000000000000000 * 0.00006000000000000000)
hessian[4][6] = hessian[6][4] = 1.62083853283423429126, (-2.98530684176687044484 - -2.98530685905581405493 - -2.98530683715648592624 + -2.98530684407206292619)/(4 * 0.00004000000000000000 * 0.00004000000000000000)
hessian[4][7] = hessian[7][4] = -1.80093162605032808266, (-2.98530685444542909224 - -2.98530685214023661089 - -2.98530683715648592624 + -2.98530683600388968557)/(4 * 0.00004000000000000000 * 0.00000400000000000000)
hessian[4][8] = hessian[8][4] = 7.20372927975887389351, (-2.98530684291946668552 - -2.98530683715648592624 - -2.98530683600388968557 + -2.98530682908831224154)/(4 * 0.00004000000000000000 * 0.00000100000000000000)
hessian[4][9] = hessian[9][4] = 5.28273267722904371624, (-2.98530684983504412955 - -2.98530684637725540753 - -2.98530683830908216692 + -2.98530680949417659420)/(4 * 0.00004000000000000000 * 0.00003000000000000000)
hessian[4][10] = hessian[10][4] = 6.30326055239827010013, (-2.98530684061427420417 - -2.98530683139350472288 - -2.98530684407206292619 + -2.98530679451042590955)/(4 * 0.00004000000000000000 * 0.00004000000000000000)
hessian[4][11] = hessian[11][4] = -2.88149055542123200269, (-2.98530682908831224154 - -2.98530680142600290949 - -2.98530681756235027891 + -2.98530681756235027891)/(4 * 0.00004000000000000000 * 0.00006000000000000000)
hessian[4][12] = hessian[12][4] = 0.18009316260503280271, (-2.98530678644225222484 - -2.98530683254610096355 - -2.98530678528965598417 + -2.98530683024090848221)/(4 * 0.00004000000000000000 * 0.00004000000000000000)
hessian[4][13] = hessian[13][4] = 5.40279487815098402592, (-2.98530683830908216692 - -2.98530683715648592624 - -2.98530683024090848221 + -2.98530682563052351952)/(4 * 0.00004000000000000000 * 0.00000400000000000000)
hessian[5][5] = 3.92202875115148996699, (-2.98530683254610096355 - -2.98530684176687044484 - -2.98530684176687044484 + -2.98530679451042590955)/(4 * 0.00006000000000000000 * 0.00006000000000000000)
hessian[5][6] = hessian[6][5] = -7.32378842756749648402, (-2.98530684291946668552 - -2.98530684176687044484 - -2.98530677145850154020 + -2.98530684061427420417)/(4 * 0.00006000000000000000 * 0.00004000000000000000)
hessian[5][7] = hessian[7][5] = 7.20372696679423984989, (-2.98530682793571600087 - -2.98530683946167840759 - -2.98530681756235027891 + -2.98530682217273479750)/(4 * 0.00006000000000000000 * 0.00000400000000000000)
hessian[5][8] = hessian[8][5] = 33.61738850256775634762, (-2.98530683715648592624 - -2.98530682447792727885 - -2.98530683600388968557 + -2.98530681525715779756)/(4 * 0.00006000000000000000 * 0.00000100000000000000)
hessian[5][9] = hessian[9][5] = 4.16215302963725708452, (-2.98530682793571600087 - -2.98530683830908216692 - -2.98530684061427420417 + -2.98530682102013855683)/(4 * 0.00006000000000000000 * 0.00003000000000000000)
hessian[5][10] = hessian[10][5] = 1.08055892937090414208, (-2.98530683139350472288 - -2.98530682217273479750 - -2.98530683254610096355 + -2.98530681295196531622)/(4 * 0.00006000000000000000 * 0.00004000000000000000)
hessian[5][11] = hessian[11][5] = 0.80041402518283966128, (-2.98530681640975403823 - -2.98530680142600290949 - -2.98530680718898411286 + -2.98530678067927102148)/(4 * 0.00006000000000000000 * 0.00006000000000000000)
hessian[5][12] = hessian[12][5] = -0.72037265042013121086, (-2.98530679220523342821 - -2.98530680949417659420 - -2.98530678644225222484 + -2.98530681064677283487)/(4 * 0.00006000000000000000 * 0.00004000000000000000)
hessian[5][13] = hessian[13][5] = 1.20062108403355227715, (-2.98530683830908216692 - -2.98530683369869720423 - -2.98530682563052351952 + -2.98530681986754231616)/(4 * 0.00006000000000000000 * 0.00000400000000000000)
hessian[6][6] = 1.08055883685231868263, (-2.98530683830908216692 - -2.98530684176687044484 - -2.98530684176687044484 + -2.98530683830908216692)/(4 * 0.00004000000000000000 * 0.00004000000000000000)
hessian[6][7] = hessian[7][6] = -12.60652068846290596582, (-2.98530684176687044484 - -2.98530683600388968557 - -2.98530685214023661089 + -2.98530685444542909224)/(4 * 0.00004000000000000000 * 0.00000400000000000000)
hessian[6][8] = hessian[8][6] = -108.05589756301968407115, (-2.98530685329283285157 - -2.98530682102013855683 - -2.98530684752985164820 + -2.98530683254610096355)/(4 * 0.00004000000000000000 * 0.00000100000000000000)
hessian[6][9] = hessian[9][6] = -2.88149050916193960603, (-2.98530685329283285157 - -2.98530683369869720423 - -2.98530685098764037022 + -2.98530684522465916686)/(4 * 0.00004000000000000000 * 0.00003000000000000000)
hessian[6][10] = hessian[10][6] = -3.24167685750165146530, (-2.98530683600388968557 - -2.98530681525715779756 - -2.98530684291946668552 + -2.98530684291946668552)/(4 * 0.00004000000000000000 * 0.00004000000000000000)
hessian[6][11] = hessian[11][6] = 4.08211154693619882039, (-2.98530683139350472288 - -2.98530682102013855683 - -2.98530686481879481420 + -2.98530681525715779756)/(4 * 0.00004000000000000000 * 0.00006000000000000000)
hessian[6][12] = hessian[12][6] = 1.08055897563019676078, (-2.98530680027340666882 - -2.98530684522465916686 - -2.98530680373119539084 + -2.98530684176687044484)/(4 * 0.00004000000000000000 * 0.00004000000000000000)
hessian[6][13] = hessian[13][6] = 14.40745231451323427052, (-2.98530683600388968557 - -2.98530684637725540753 - -2.98530683830908216692 + -2.98530683946167840759)/(4 * 0.00004000000000000000 * 0.00000400000000000000)
hessian[7][7] = 36.01862558211266218677, (-2.98530684176687044484 - -2.98530684176687044484 - -2.98530684176687044484 + -2.98530683946167840759)/(4 * 0.00000400000000000000 * 0.00000400000000000000)
hessian[7][8] = hessian[8][7] = -648.33538537811818969203, (-2.98530684868244788888 - -2.98530683369869720423 - -2.98530684291946668552 + -2.98530683830908216692)/(4 * 0.00000400000000000000 * 0.00000100000000000000)
hessian[7][9] = hessian[9][7] = -2.40124216806710455430, (-2.98530684752985164820 - -2.98530683715648592624 - -2.98530684522465916686 + -2.98530683600388968557)/(4 * 0.00000400000000000000 * 0.00003000000000000000)
hessian[7][10] = hessian[10][7] = -45.02328926347941973063, (-2.98530684868244788888 - -2.98530681179936907554 - -2.98530683600388968557 + -2.98530682793571600087)/(4 * 0.00000400000000000000 * 0.00004000000000000000)
hessian[7][11] = hessian[11][7] = -25.21304230211167052289, (-2.98530683946167840759 - -2.98530680488379163151 - -2.98530681640975403823 + -2.98530680603638787218)/(4 * 0.00000400000000000000 * 0.00006000000000000000)
hessian[7][12] = hessian[12][7] = -19.81024719266421740826, (-2.98530679220523342821 - -2.98530683139350472288 - -2.98530677722148274356 + -2.98530682908831224154)/(4 * 0.00000400000000000000 * 0.00004000000000000000)
hessian[7][13] = hessian[13][7] = -36.01863252100656609400, (-2.98530684637725540753 - -2.98530683946167840759 - -2.98530684061427420417 + -2.98530683600388968557)/(4 * 0.00000400000000000000 * 0.00000400000000000000)
hessian[8][8] = 6051.13004148449817876099, (-2.98530682332533103818 - -2.98530684176687044484 - -2.98530684176687044484 + -2.98530683600388968557)/(4 * 0.00000100000000000000 * 0.00000100000000000000)
hessian[8][9] = hessian[9][8] = -76.83974937814734573749, (-2.98530685559802533291 - -2.98530684061427420417 - -2.98530683485129344490 + -2.98530682908831224154)/(4 * 0.00000100000000000000 * 0.00003000000000000000)
hessian[8][10] = hessian[10][8] = 201.70433656652161857892, (-2.98530685559802533291 - -2.98530684868244788888 - -2.98530684868244788888 + -2.98530680949417659420)/(4 * 0.00000100000000000000 * 0.00004000000000000000)
hessian[8][11] = hessian[11][8] = -24.01241983029933635407, (-2.98530682678311976019 - -2.98530681064677283487 - -2.98530681295196531622 + -2.98530680257859915017)/(4 * 0.00000100000000000000 * 0.00006000000000000000)
hessian[8][12] = hessian[12][8] = 151.27825103711242604732, (-2.98530678874744470619 - -2.98530684061427420417 - -2.98530676800071281818 + -2.98530679566302215022)/(4 * 0.00000100000000000000 * 0.00004000000000000000)
hessian[8][13] = hessian[13][8] = -216.11179512603939656401, (-2.98530684752985164820 - -2.98530683369869720423 - -2.98530684061427420417 + -2.98530683024090848221)/(4 * 0.00000100000000000000 * 0.00000400000000000000)
hessian[9][9] = 1.28066236627767526812, (-2.98530685098764037022 - -2.98530684176687044484 - -2.98530684176687044484 + -2.98530682793571600087)/(4 * 0.00003000000000000000 * 0.00003000000000000000)
hessian[9][10] = hessian[10][9] = -1.68086951764697278833, (-2.98530685675062157358 - -2.98530683024090848221 - -2.98530684061427420417 + -2.98530682217273479750)/(4 * 0.00003000000000000000 * 0.00004000000000000000)
hessian[9][11] = hessian[11][9] = 2.08107648397910027782, (-2.98530682908831224154 - -2.98530680373119539084 - -2.98530682908831224154 + -2.98530678874744470619)/(4 * 0.00003000000000000000 * 0.00006000000000000000)
hessian[9][12] = hessian[12][9] = -6.96360219487601650457, (-2.98530678874744470619 - -2.98530682447792727885 - -2.98530676800071281818 + -2.98530683715648592624)/(4 * 0.00003000000000000000 * 0.00004000000000000000)
hessian[9][13] = hessian[13][9] = 14.40745300840262643760, (-2.98530684752985164820 - -2.98530683369869720423 - -2.98530684868244788888 + -2.98530682793571600087)/(4 * 0.00003000000000000000 * 0.00000400000000000000)
hessian[10][10] = 5.76298106458317160872, (-2.98530684061427420417 - -2.98530684176687044484 - -2.98530684176687044484 + -2.98530680603638787218)/(4 * 0.00004000000000000000 * 0.00004000000000000000)
hessian[10][11] = hessian[11][10] = 0.24012421680671039437, (-2.98530680142600290949 - -2.98530678990004094686 - -2.98530683024090848221 + -2.98530681640975403823)/(4 * 0.00004000000000000000 * 0.00006000000000000000)
hessian[10][12] = hessian[12][10] = 2.88149053229158580436, (-2.98530679335782966888 - -2.98530684637725540753 - -2.98530677837407898423 + -2.98530681295196531622)/(4 * 0.00004000000000000000 * 0.00004000000000000000)
hessian[10][13] = hessian[13][10] = -10.80558975630196805184, (-2.98530685790321781425 - -2.98530684983504412955 - -2.98530681871494651958 + -2.98530681756235027891)/(4 * 0.00004000000000000000 * 0.00000400000000000000)
hessian[11][11] = 3.28169753717312406849, (-2.98530682678311976019 - -2.98530684176687044484 - -2.98530684176687044484 + -2.98530680949417659420)/(4 * 0.00006000000000000000 * 0.00006000000000000000)
hessian[11][12] = hessian[12][11] = 2.28117996714516335643, (-2.98530679335782966888 - -2.98530682217273479750 - -2.98530677145850154020 + -2.98530677837407898423)/(4 * 0.00006000000000000000 * 0.00004000000000000000)
hessian[11][13] = hessian[13][11] = -2.40124216806710455430, (-2.98530682447792727885 - -2.98530682563052351952 - -2.98530680718898411286 + -2.98530681064677283487)/(4 * 0.00006000000000000000 * 0.00000400000000000000)
hessian[12][12] = 20.35052688864613301689, (-2.98530677145850154020 - -2.98530684176687044484 - -2.98530684176687044484 + -2.98530678183186726216)/(4 * 0.00004000000000000000 * 0.00004000000000000000)
hessian[12][13] = hessian[13][12] = -16.20838463445295474230, (-2.98530677145850154020 - -2.98530677030590529952 - -2.98530682102013855683 + -2.98530683024090848221)/(4 * 0.00004000000000000000 * 0.00000400000000000000)
hessian[13][13] = 198.10247192664220960978, (-2.98530684407206292619 - -2.98530684176687044484 - -2.98530684176687044484 + -2.98530682678311976019)/(4 * 0.00000400000000000000 * 0.00000400000000000000)
gradient[0]: -0.00633927915716370194, (-2.98530684061427420417 - -2.98530678990004094686)/(2 * 0.000004)
gradient[1]: -0.00025933414860013215, (-2.98530683254610096355 - -2.98530679105263718753)/(2 * 0.000080)
gradient[2]: -0.00576298120336105058, (-2.98530683946167840759 - -2.98530682793571600087)/(2 * 0.000001)
gradient[3]: 0.00028814905276656572, (-2.98530683024090848221 - -2.98530684752985164820)/(2 * 0.000030)
gradient[4]: -0.00011525961851610587, (-2.98530684176687044484 - -2.98530683254610096355)/(2 * 0.000040)
gradient[5]: -0.00010565465539495258, (-2.98530683369869720423 - -2.98530682102013855683)/(2 * 0.000060)
gradient[6]: 0.00011525961851610587, (-2.98530683715648592624 - -2.98530684637725540753)/(2 * 0.000040)
gradient[7]: 0.00086444718050415759, (-2.98530684061427420417 - -2.98530684752985164820)/(2 * 0.000004)
gradient[8]: -0.00518668286098034059, (-2.98530684291946668552 - -2.98530683254610096355)/(2 * 0.000001)
gradient[9]: -0.00013446955401027102, (-2.98530684407206292619 - -2.98530683600388968557)/(2 * 0.000030)
gradient[10]: -0.00037459377266735311, (-2.98530684983504412955 - -2.98530681986754231616)/(2 * 0.000040)
gradient[11]: -0.00012486458903874600, (-2.98530682332533103818 - -2.98530680834158035353)/(2 * 0.000060)
gradient[12]: 0.00048985339118345905, (-2.98530678528965598417 - -2.98530682447792727885)/(2 * 0.000040)
gradient[13]: -0.00057629806482495383, (-2.98530684291946668552 - -2.98530683830908216692)/(2 * 0.000004)
line search starting at fitness: -2.985306841766870
initial point: [14]: 0.57171300000000002672, 12.31211899999999914712, -3.30518700000000009709, 148.01025699999999574175, 22.45390199999999936153, 0.42035000000000000142, -0.46885799999999999699, 0.76057900000000000507, -1.36164400000000007651, 177.88423800000001051558, 23.88289199999999823376, 1.21063900000000002066, -1.61197400000000001796, 8.53437800000000024170
loop 1, evaluations: 1, step: 1.000000000000000, fitness: -2.985306747253981
loop 2, evaluations: 2, step: 2.000000000000000, fitness: -2.985306733422826
loop 2, evaluations: 3, step: 4.000000000000000, fitness: -2.985306850987640
loop 3, evaluations: 4, step: 1.785714283530075, fitness: -2.985306755322155
loop 3, evaluations: 5, step: 2.595721102268431, fitness: -2.985306818714947
loop 3, evaluations: 6, step: 2.061540494505907, fitness: -2.985306771458502
loop 3, evaluations: 7, step: 1.912425577977664, fitness: -2.985306748406577
loop 3, evaluations: 8, step: 1.972377618796003, fitness: -2.985306729965038
loop 3, evaluations: 9, step: 1.973523614802404, fitness: -2.985306733422826 |
|
|
|
|
|
out file
hessian [14 x 14]:
882.45647594797912915965 -6.30326069117614817827 -792.40988770656883843913 -69.63602102357431533619 54.02794739373106125413 55.22856894035754748984 21.61117881871454571296 -198.10247886553611351701 -1152.59621291663461306598 36.01863252100656609400 10.80558975630196805184 -22.81180013404456374815 90.04657991473762024270 108.05589062412579437478
-6.30326069117614817827 2.70139735233931821412 3.60186325210065616531 3.36173894277536033925 -1.44074526614579290218 -0.42021737941174319708 3.42176998541221477623 0.90046581302516404133 64.83353715003303818776 0.24012417054741772016 0.09004658130251640136 0.42021735628209683222 2.07107130056893806724 -3.60186325210065616531
-792.40988770656883843913 3.60186325210065616531 -1152.59635169451257752371 96.04968302194076557043 -36.01863252100655898857 28.81490786717695939956 0.00000000000000000000 288.14903241247691312310 3169.63955082627535375650 19.20993364379341983295 -93.64844177905949607066 0.00000185037170770859 7.20372650420131233062 -432.22359025207879312802
-69.63602102357431533619 3.36173894277536033925 96.04968302194076557043 3.52182172314030594862 -4.56235993429032671287 1.28066248963578899200 -1.20062099151496659566 -7.20372650420131321880 -67.23477700513551269523 1.92099373445368359903 -1.44074520832167696227 -0.32016562240894724800 -0.48024834109483544031 0.00000000000000000000
54.02794739373106125413 -1.44074526614579290218 -36.01863252100655898857 -4.56235993429032671287 1.08055890624125772170 -2.64136629235522901737 1.62083853283423429126 -1.80093162605032808266 7.20372927975887389351 5.28273267722904371624 6.30326055239827010013 -2.88149055542123200269 0.18009316260503280271 5.40279487815098402592
55.22856894035754748984 -0.42021737941174319708 28.81490786717695939956 1.28066248963578899200 -2.64136629235522901737 3.92202875115148996699 -7.32378842756749648402 7.20372696679423984989 33.61738850256775634762 4.16215302963725708452 1.08055892937090414208 0.80041402518283966128 -0.72037265042013121086 1.20062108403355227715
21.61117881871454571296 3.42176998541221477623 0.00000000000000000000 -1.20062099151496659566 1.62083853283423429126 -7.32378842756749648402 1.08055883685231868263 -12.60652068846290596582 -108.05589756301968407115 -2.88149050916193960603 -3.24167685750165146530 4.08211154693619882039 1.08055897563019676078 14.40745231451323427052
-198.10247886553611351701 0.90046581302516404133 288.14903241247691312310 -7.20372650420131321880 -1.80093162605032808266 7.20372696679423984989 -12.60652068846290596582 36.01862558211266218677 -648.33538537811818969203 -2.40124216806710455430 -45.02328926347941973063 -25.21304230211167052289 -19.81024719266421740826 -36.01863252100656609400
-1152.59621291663461306598 64.83353715003303818776 3169.63955082627535375650 -67.23477700513551269523 7.20372927975887389351 33.61738850256775634762 -108.05589756301968407115 -648.33538537811818969203 6051.13004148449817876099 -76.83974937814734573749 201.70433656652161857892 -24.01241983029933635407 151.27825103711242604732 -216.11179512603939656401
36.01863252100656609400 0.24012417054741772016 19.20993364379341983295 1.92099373445368359903 5.28273267722904371624 4.16215302963725708452 -2.88149050916193960603 -2.40124216806710455430 -76.83974937814734573749 1.28066236627767526812 -1.68086951764697278833 2.08107648397910027782 -6.96360219487601650457 14.40745300840262643760
10.80558975630196805184 0.09004658130251640136 -93.64844177905949607066 -1.44074520832167696227 6.30326055239827010013 1.08055892937090414208 -3.24167685750165146530 -45.02328926347941973063 201.70433656652161857892 -1.68086951764697278833 5.76298106458317160872 0.24012421680671039437 2.88149053229158580436 -10.80558975630196805184
-22.81180013404456374815 0.42021735628209683222 0.00000185037170770859 -0.32016562240894724800 -2.88149055542123200269 0.80041402518283966128 4.08211154693619882039 -25.21304230211167052289 -24.01241983029933635407 2.08107648397910027782 0.24012421680671039437 3.28169753717312406849 2.28117996714516335643 -2.40124216806710455430
90.04657991473762024270 2.07107130056893806724 7.20372650420131233062 -0.48024834109483544031 0.18009316260503280271 -0.72037265042013121086 1.08055897563019676078 -19.81024719266421740826 151.27825103711242604732 -6.96360219487601650457 2.88149053229158580436 2.28117996714516335643 20.35052688864613301689 -16.20838463445295474230
108.05589062412579437478 -3.60186325210065616531 -432.22359025207879312802 0.00000000000000000000 5.40279487815098402592 1.20062108403355227715 14.40745231451323427052 -36.01863252100656609400 -216.11179512603939656401 14.40745300840262643760 -10.80558975630196805184 -2.40124216806710455430 -16.20838463445295474230 198.10247192664220960978
gradient[14]: -0.00633927915716370194, -0.00025933414860013215, -0.00576298120336105058, 0.00028814905276656572, -0.00011525961851610587, -0.00010565465539495258, 0.00011525961851610587, 0.00086444718050415759, -0.00518668286098034059, -0.00013446955401027102, -0.00037459377266735311, -0.00012486458903874600, 0.00048985339118345905, -0.00057629806482495383
initial_fitness: -2.98530684176687044484
inital_parameters[14]: 0.57171300000000002672, 12.31211899999999914712, -3.30518700000000009709, 148.01025699999999574175, 22.45390199999999936153, 0.42035000000000000142, -0.46885799999999999699, 0.76057900000000000507, -1.36164400000000007651, 177.88423800000001051558, 23.88289199999999823376, 1.21063900000000002066, -1.61197400000000001796, 8.53437800000000024170
result_fitness: -2.98530673342282648619
result_parameters[14]: 0.57171023310301738451, 12.31159045077501801302, -3.30517495098843516743, 148.01059727942629251629, 22.45401028483425420745, 0.42010611491115568139, -0.46892261685864594645, 0.76055531511989238336, -1.36164776621840832860, 177.88429089670174221283, 23.88286053550067578044, 1.21053790484657564086, -1.61181317710548888122, 8.53439162571984688554
number_evaluations: 444
metadata: it: 5, ev: 588
|
|
|
TravisVolunteer moderator Project administrator Project developer Project tester Project scientist Send message
Joined: 30 Aug 07 Posts: 1976 Credit: 26,480 RAC: 0
|
|
Sweet. Looks like you're getting what I'm getting. Is that for stripe 20?
____________
|
|
|
|
|
|
yes for 20 |
|
|
|
|
initial likelihood: -2.98530684176687044484
Sweet. Looks like you're getting what I'm getting. Is that for stripe 20?
If that is okay, I guess my single precision ATI implementation will do it, too.
I'm getting a fitness of -2.985312812926748 for the stripe20 test unit.
As a reference point, the stock CPU app (using a complete DP calculation) arrives at a value of -2.985312797571472.
So it appears my approach ist actually a bit (two digits, i.e. 6 bits) more precise :D
In the moment I'm using the integration layout as mentioned in the other thread (mu-r plane) and doing all summations with the Kahan method. This includes the convolution loop and the summation of all the values between different mu-r planes as well as the final reduction (done on GPU in SP as a treelike Kahan sum). That way I have to transfer virtually nothing (16 bytes or so for the whole integral) back from the GPU to the CPU. As it appears to me, it is unnecessary to do the reduction on the CPU. But I have to mention, that I do all CPU operations (including the likelihood compution) in DP. I have to test, if one looses the precision there (or have you tried it already, Travis?).
|
|
|
TravisVolunteer moderator Project administrator Project developer Project tester Project scientist Send message
Joined: 30 Aug 07 Posts: 1976 Credit: 26,480 RAC: 0
|
|
I'm working on getting the kahan summation working to see how much that improves the accuracy.
The general plan for milkyway_gpu is to have 2 types of applications:
1. single precision (probably using kahan summation in the kernel for highest accuracy) for GPUs that don't support double precision
2. double precision for GPUs that do support double precision
The server will have different validation for comparing floating point <-> floating point, floating point <-> double, double <-> double results, and we'll update our searches using the floating point values for exploration and double values for highly accurate exploitation.
I'm hoping to have the code updated tomorrow with double and single precision kernels (with the single precision kernel doing a kahan summation). I'll post the values I'm getting for the different streams made available in the code package and we can work from there.
After that the main priority will be getting workunits available on milkyway_gpu.
____________
|
|
|
|
|
|
CPU(intel core2 e6750) vs GPU(nv geforce 9600gt) computation times linux_x86_64 on stripe-20. Could you explain this?
CPU:
real 4m15.913s
user 4m6.327s
sys 0m0.800s
GPU:
real 15m45.243s
user 15m30.878s
sys 0m0.484s
about makefile
what the difference?
48: LINUX_LDFLAGS_i686 = -L/usr/X11R6/lib -L/usr/local/lib
53: LINUX_LDFLAGS_x86_64 = -L/usr/local/lib |
|
|
|
|
|
looks like the cpu is faster...
CPU(intel core2 e6750) vs GPU(nv geforce 9600gt) computation times linux_x86_64 on stripe-20. Could you explain this?
CPU:
real 4m15.913s
user 4m6.327s
sys 0m0.800s
GPU:
real 15m45.243s
user 15m30.878s
sys 0m0.484s
about makefile
what the difference?
48: LINUX_LDFLAGS_i686 = -L/usr/X11R6/lib -L/usr/local/lib
53: LINUX_LDFLAGS_x86_64 = -L/usr/local/lib
____________
mic.
|
|
|
|
|
looks like the cpu is faster...
Or the GPU calculates quite a bit more ;)
|
|
|
|
|
looks like the cpu is faster...
Or the GPU calculates quite a bit more ;)
;))
____________
mic.
|
|
|
|
|
looks like the cpu is faster...
Or the GPU calculates quite a bit more ;)
;))
As long as the credit is appropriate ;) |
|
|
|
|
looks like the cpu is faster...
Or the GPU calculates quite a bit more ;)
;))
I guess the real reason is that the test units are much smaller than the real production units and the performance is severly limited by the administrative overhead. The integrals of current production WUs are a factor of 16 larger than those of the test WUs. And I don't know what the CPU has actually done to arrive at the numbers posted by trisf. Be assured that even mainstream GPUs (let alone high end ones) will be faster than the CPU. |
|
|
|
|
|
How to make GPU CUDA static binary?
dynamic: ok
static: /usr/bin/ld: cannot find -lcudart |
|
|
TravisVolunteer moderator Project administrator Project developer Project tester Project scientist Send message
Joined: 30 Aug 07 Posts: 1976 Credit: 26,480 RAC: 0
|
How to make GPU CUDA static binary?
dynamic: ok
static: /usr/bin/ld: cannot find -lcudart
You need to have the CUDA compiler (nvcc), runtime and drivers installed to be able to compile the application.
____________
|
|
|
|
|
initial likelihood: -2.98530684176687044484
Sweet. Looks like you're getting what I'm getting. Is that for stripe 20?
If that is okay, I guess my single precision ATI implementation will do it, too.
I'm getting a fitness of -2.985312812926748 for the stripe20 test unit.
As a reference point, the stock CPU app (using a complete DP calculation) arrives at a value of -2.985312797571472.
So it appears my approach ist actually a bit (two digits, i.e. 6 bits) more precise :D
In the moment I'm using the integration layout as mentioned in the other thread (mu-r plane) and doing all summations with the Kahan method. This includes the convolution loop and the summation of all the values between different mu-r planes as well as the final reduction (done on GPU in SP as a treelike Kahan sum). That way I have to transfer virtually nothing (16 bytes or so for the whole integral) back from the GPU to the CPU. As it appears to me, it is unnecessary to do the reduction on the CPU. But I have to mention, that I do all CPU operations (including the likelihood compution) in DP. I have to test, if one looses the precision there (or have you tried it already, Travis?).
I have an update to this. Sorry for putting it here and not in the other thread, but the comparison values posted by trisf are here.
I do now the likelihood computation also on the GPU. As one does it some hundred times with the GPU project code (opposed to only once at the legacy version), it is definitely faster than the CPU. And I can now clearly state that the likelihood computation is not hurting the precision. Tuned it a bit and I'm getting now an initial likelihood value of -2.985312794995786 (double precision CPU arrives at -2.985312797571472). Not that bad for single precision I would say :D
The complete output file (sorry for the font size ;):
hessian [14 x 14]:
-390.01297330587551000 19.52481024081187000 -1533.2734804029969000 0.36626257582383914000 0.02027267242965535500 -0.05197370311904592200 -0.03842967610800939600 0.69457634088720965000 1579.36089206778270000 0.17400710502120850000 0.47681164572210827000 -0.03224550256438382000 -0.38525987955395630000 -2.27792784635028060000
19.52481024081187000 -0.89935716859890202000 79.88489592047896800000 -0.00695203154303195800 -0.00148870499261377610 -0.00311565588143973040 -0.00698965191281430890 -0.06931503981899565800 -77.84448502468065100000 0.01937431696556283400 0.03816721244609410500 0.00344173763563067730 -0.01042395336714463200 0.23676720306564644000
-1533.27348040299690000 79.88489592047896800000 -6506.29849996420310000 -10.64588417420964100000 0.71608552421054117000 -0.34682442103436034000 -0.69671490798839375000 -10.06847383244746700000 6213.89129040750320000 1.37108842797791410000 0.75474071437042756000 0.17694549529304973000 0.88234974882084305000 -405.46663249152459000
0.36626257582383914000 -0.00695203154303195800 -10.64588417420964100000 -0.35369263073903312000 0.14107354173731321000 0.13820025371005487000 0.23365800035553733000 0.46932180364223086000 8.19744272462230580000 -0.03979594431768873600 -0.02920173362378856000 0.01743864312212887700 0.00161407924063420640 -0.14854321476557666000
0.02027267242965535500 -0.00148870499261377610 0.71608552421054117000 0.14107354173731321000 -0.16454788920317040000 0.01043720665450109500 0.01784725145448362200 -0.13678780330650397000 -6.07685013420677840000 0.00068149189994907511 0.01583087827494722100 -0.01504449342881741600 -0.03803707349092632500 0.32327057697401068000
-0.05197370311904592200 -0.00311565588143973040 -0.34682442103436034000 0.13820025371005487000 0.01043720665450109500 -0.08778481028512727400 0.03903026050503891100 0.09120667184466431400 0.24538889438948294000 0.02641849701963868200 -0.00499808527898437570 -0.01218546868347263800 0.00932971292814480800 -0.03992223218673984800
-0.03842967610800939600 -0.00698965191281430890 -0.69671490798839375000 0.23365800035553733000 0.01784725145448362200 0.03903026050503891100 -0.06100904503814063400 0.21312882014790088000 -0.70542460761657810000 -0.02401597439434984000 0.01272565386400969900 0.01001337901485044100 -0.01750821709833871500 0.00449987269668383670
0.69457634088720965000 -0.06931503981899565800 -10.06847383244746700000 0.46932180364223086000 -0.13678780330650397000 0.09120667184466431400 0.21312882014790088000 -12.38068675357695300000 10.24474949318232600000 -0.18561448674366451000 -0.34162186968167413000 0.08297251774536107400 -0.04154662724964452300 0.52725879218229466000
1579.36089206778270000000 -77.84448502468065100000 6213.89129040750320000000 8.19744272462230580000 -6.07685013420677840000 0.24538889438948294000 -0.70542460761657810000 10.24474949318232600000 -6570.86651756344510000000 -1.84772567616657100000 -0.98881736132483400000 -2.05113148687985360000 -0.64424021672948573000 -423.39445838202039000000
0.17400710502120850000 0.01937431696556283400 1.37108842797791410000 -0.03979594431768873600 0.00068149189994907511 0.02641849701963868200 -0.02401597439434984000 -0.18561448674366451000 -1.84772567616657100000 -1.68289518123445860000 0.43500184935633485000 -0.83326524692574189000 0.08769457382484800700 0.43554789404727973000
0.47681164572210827000 0.03816721244609410500 0.75474071437042756000 -0.02920173362378856000 0.01583087827494722100 -0.00499808527898437570 0.01272565386400969900 -0.34162186968167413000 -0.98881736132483400000 0.43500184935633485000 -0.32984580344841413000 -0.03374971598487282200 0.00027394753132625732 10.65527388544040700000
-0.03224550256438382000 0.00344173763563067730 0.17694549529304973000 0.01743864312212887700 -0.01504449342881741600 -0.01218546868347263800 0.01001337901485044100 0.08297251774536107400 -2.05113148687985360000 -0.83326524692574189000 -0.03374971598487282200 0.03124775130005888100 -0.79111966977407622000 -6.42424817047052970000
-0.38525987955395630000 -0.01042395336714463200 0.88234974882084305000 0.00161407924063420640 -0.03803707349092632500 0.00932971292814480800 -0.01750821709833871500 -0.04154662724964452300 -0.64424021672948573000 0.08769457382484800700 0.00027394753132625732 -0.79111966977407622000 0.02997525838654979300 -10.10300454407086900000
-2.27792784635028060000 0.23676720306564644000 -405.46663249152459000000 -0.14854321476557666000 0.32327057697401068000 -0.03992223218673984800 0.00449987269668383670 0.52725879218229466000 -423.39445838202039000000 0.43554789404727973000 10.65527388544040700000 -6.42424817047052970000 -10.10300454407086900000 -98.01699035749678000000
gradient[14]: -0.002031011692160689, 3.870287978990916e-005, -0.00391933197008143, -4.165342145275493e-006, 3.936584391794895e-006, -4.114278547480884e-005, 3.338801457530849e-005, -3.847977492199561e-006, 0.005981095174689699, 6.980741170300082e-005, 4.713984758097922e-005, -2.486414777772931e-006, 4.213692172960747e-005, 0.0004124264263438704
initial_fitness: -2.98531279499578610000
inital_parameters[14]: 0.571713, 12.312119, -3.305187, 148.010257, 22.453902, 0.42035, -0.468858, 0.760579, -1.361644, 177.884238, 23.882892, 1.210639, -1.611974, 8.534378
result_fitness: -2.98531254274110090000
result_parameters[14]: 0.570715379831475, 12.31644736374576, -3.305076710884835, 148.0107118875137, 22.46190531351187, 0.4232579677285489, -0.4614999752737721, 0.7603950396852607, -1.361832050323567, 177.8835356891613, 23.88085791689922, 1.211781892132873, -1.610141258371338, 8.534315057698125
number_evaluations: 445
metadata: it: 5, ev: 588
By the way, that small test unit took about 4 minutes, i.e. half a second per evaluation ;)
But as it is a two stream WU with 80 x 800 x 350 spatial and 60 convolution steps, one can estimate the time for the bigger WUs (neglecting that the efficiency raises slightly with the size). Current production WUs have up to 320 x 1600 x 700 spatial and 120 convolution steps. That is a factor of 4 x 2 x 2 x 2 = 32 bigger. It would take about two hours total for about 445 * 3.6 TFlop ~ 1.6 Peta(!)Flop. That is about 220 GFlop/ second with an overclocked (10%) HD3870.
I should not start thinking about what the newer GPUs or even the next generation can do. I would be prepared for north of 600GFlops on a fast HD4800 :o
But the GT200 based nvidia cards have a chance to get close on single precision. That is a difference to the double precision performance. |
|
|
|
|
By the way, that small test unit took about 4 minutes, i.e. half a second per evaluation ;)
But as it is a two stream WU with 80 x 800 x 350 spatial and 60 convolution steps, one can estimate the time for the bigger WUs (neglecting that the efficiency raises slightly with the size). Current production WUs have up to 320 x 1600 x 700 spatial and 120 convolution steps. That is a factor of 4 x 2 x 2 x 2 = 32 bigger. It would take about two hours total
Could you give us a quick summary of what the GPUs are doing that the CPU code doesn't? It sounds like they're computing a great deal more! |
|
|
|
|
By the way, that small test unit took about 4 minutes, i.e. half a second per evaluation ;)
But as it is a two stream WU with 80 x 800 x 350 spatial and 60 convolution steps, one can estimate the time for the bigger WUs (neglecting that the efficiency raises slightly with the size). Current production WUs have up to 320 x 1600 x 700 spatial and 120 convolution steps. That is a factor of 4 x 2 x 2 x 2 = 32 bigger. It would take about two hours total
Could you give us a quick summary of what the GPUs are doing that the CPU code doesn't? It sounds like they're computing a great deal more!
The CPU code is doing just one "evaluation". That means the server sends a bunch of parameters (a WU) for a small volume of the Milky Way and the CPU code checks how good these parameters fit with the reality, i.e. the observed stars in that region. The result (called "fitness" or "likelihood") is then send back. From all the results the server tries to determine (using different algorithms like genetic search [gs_ WUs] or particle search [ps_ WUs]) in what directions the parameters have to evolve to get a better fitness.
The difference with the GPU project is that not a sole set of parameters is checked, but more or less a region of parameter sets. That's why there are so many numbers in the result file of the GPU code posted above ;) In principle the scientific app takes over a small part of the search algorithm. It does not only one simple check, it looks around a bit to see in which direction it gets better. In case of the double stream WUs it means the app does 445 evaluations of different parameter sets instead of a single one. So it is really a great deal more work. In case of triple stream WUs it would be actually about 900 evaluations, as there are more parameter combinations possible (but only about 150 for single stream WUs). |
|
|
|
|
|
Thanks :) I'm looking forward to the GPU apps even more now! |
|
|
|
|
By the way, that small test unit took about 4 minutes, i.e. half a second per evaluation ;)
But as it is a two stream WU with 80 x 800 x 350 spatial and 60 convolution steps, one can estimate the time for the bigger WUs (neglecting that the efficiency raises slightly with the size). Current production WUs have up to 320 x 1600 x 700 spatial and 120 convolution steps. That is a factor of 4 x 2 x 2 x 2 = 32 bigger. It would take about two hours total for about 445 * 3.6 TFlop ~ 1.6 Peta(!)Flop. That is about 220 GFlop/ second with an overclocked (10%) HD3870.
Just done a real counting of all the operations on the GPU (one can neglect the preparation of lookup tables done on the CPU). One evaluation for the stripe20 test WU represents about 113.94 GFlop with the single precision code.
445 evaluations * 113.94 GFlop / 215.14 seconds = 235 GFlop/s for that HD3870@860MHz
(theoretical peak would be 550 GFlops, that of a 3GHz quad core only 96 GFlops).
I used quite simple counting rules: addition and multiplication count as 1 flop, a division (only real ones, not those transformed to a multiplication by the compiler) and square root count as 4 flops and pow/exp/log also as only 4 flops. This a small deviation from the "standard" practice (if such thing exists in this respect) to count 8 flops for the latter. The reason is that the GPUs have hardware support for these instructions (opposed to double precision) with a fourth to a fifth of the througput of the simple instructions (same throughput as with divisions or square roots).
Furthermore I was a bit conservative, as exp(x) for instance is executed as pow(2, x*log2(e)) and log(x) as log2(x)*log(2) but I counted that whole construct as 4 flops. The actual implementation is dependent on the exact GPU and may change in the future. Therefore I decided to count the instructions in the C code and not in the GPU assembly like for the double precision version. It results in a considerably lower flops count than for double precision, but this should be compensated by the far higher execution speed for single precision.
@Travis:
You will get a PM. |
|
|