OpenCL for Nvidia available for testing

Author	Message
CTAPbIi Send message Joined: 4 Jan 10 Posts: 86 Credit: 51,753,924 RAC: 0	Message 44622 - Posted: 3 Dec 2010, 17:46:54 UTC - in response to Message 44620. Last modified: 3 Dec 2010, 17:47:24 UTC The WUs are running fine now. :-) how many secs on which card? ID: 44622 · Rating: 0 · rate: / Reply Quote

arkayn Send message Joined: 14 Feb 09 Posts: 999 Credit: 74,932,619 RAC: 0	Message 44623 - Posted: 3 Dec 2010, 17:53:12 UTC Here is the stderr that came up on my 460. Run time 614.917938 CPU time 31.6875 stderr out <core_client_version>6.12.6</core_client_version> <![CDATA[ <stderr_txt> <search_application> milkywayathome separation 0.48 Windows x86 double OpenCL </search_application> Found 1 platforms Platform 0 information: Platform name: NVIDIA CUDA Platform version: OpenCL 1.0 CUDA 3.2.1 Platform vendor: Platform profile: Platform extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll Using device 0 on platform 0 Found 1 CL devices Device GeForce GTX 460 (NVIDIA Corporation:0x10de) Type: CL_DEVICE_TYPE_GPU Driver version: 263.06 Version: OpenCL 1.0 CUDA Compute capability: 2.1 Little endian: CL_TRUE Error correction: CL_FALSE Image support: CL_TRUE Address bits: 32 Max compute units: 7 Clock frequency: 1600 Mhz Global mem size: 804847616 Max mem alloc: 201211904 Global mem cache: 114688 Cacheline size: 128 Local mem type: CL_LOCAL Local mem size: 49152 Max const args: 9 Max const buf size: 65536 Max parameter size: 4352 Max work group size: 1024 Max work item dim: 3 Max work item sizes: { 1024, 1024, 64 } Mem base addr align: 4096 Min type align size: 128 Timer resolution: 1000 ns Double extension: MW_CL_KHR_FP64 Extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 Compiler flags: -cl-mad-enable -cl-no-signed-zeros -cl-strict-aliasing -cl-finite-math-only -DUSE_CL_MATH_TYPES=0 -DUSE_MAD=0 -DUSE_FMA=0 -cl-nv-verbose -DDOUBLEPREC=1 -DMILKYWAY_MATH_COMPILATION -DNSTREAM=3 -DFAST_H_PROB=1 -DAUX_BG_PROFILE=0 -DUSE_IMAGES=1 -DI_DONT_KNOW_WHY_THIS_DOESNT_WORK_HERE=1 Build status: CL_BUILD_SUCCESS Build log: : Considering profile 'compute_20' for gpu='sm_21' in 'cuModuleLoadDataEx_4' Kernel work group info: Work group size = 512 Kernel local mem size = 0 Compile work group size = { 0, 0, 0 } Lower n solution: n = 40, x = 0 Higher n solution: n = 40, x = 0 Using solution: n = 40, x = 0 Range: { nu_steps = 640, mu_steps = 1600, r_steps = 1400 } Iteration area: 2240000 Chunk estimate: 40 Num chunks: 40 Added area: 0 Effective area: 2240000 Integration time: 584.577879 s. Average time per iteration = 913.402936 ms <background_integral> 0.00057622614096211838 </background_integral> <stream_integrals> 105.68890505238272000000 172.60556751830708000000 161.27537590087846000000 </stream_integrals> <background_only_likelihood> -3.29958249045773530000 </background_only_likelihood> <stream_only_likelihood> -37.34692154949922100000 -4.75605645991618700000 -3.79966623105932080000 </stream_only_likelihood> <search_likelihood> -3.00484287881831240000 </search_likelihood> 10:44:59 (2892): called boinc_finish </stderr_txt> ]]> Validate state Valid Claimed credit 0.177164871378224 Granted credit 213.760359413782 ID: 44623 · Rating: 0 · rate: / Reply Quote

CTAPbIi Send message Joined: 4 Jan 10 Posts: 86 Credit: 51,753,924 RAC: 0	Message 44624 - Posted: 3 Dec 2010, 18:02:28 UTC Last modified: 3 Dec 2010, 18:04:44 UTC 615 secs??? wow... even if u run 2 WUs concurrent, it too much. my 4870 run it in 325 secs and 4890 in 312 secs. but taking in consideration dp cropped by nvidia in fermi cards... But anyway I'll try it on GTX275 ID: 44624 · Rating: 0 · rate: / Reply Quote

arkayn Send message Joined: 14 Feb 09 Posts: 999 Credit: 74,932,619 RAC: 0	Message 44625 - Posted: 3 Dec 2010, 18:06:01 UTC Yes I know, my 5830 runs the same type of unit in 130 seconds. ID: 44625 · Rating: 0 · rate: / Reply Quote

CTAPbIi Send message Joined: 4 Jan 10 Posts: 86 Credit: 51,753,924 RAC: 0	Message 44628 - Posted: 3 Dec 2010, 18:19:45 UTC Last modified: 3 Dec 2010, 18:22:57 UTC So, there is room for improvement :-) ID: 44628 · Rating: 0 · rate: / Reply Quote

Matt Arsenault Volunteer moderator Project developer Project tester Project scientist Send message Joined: 8 May 10 Posts: 576 Credit: 15,979,383 RAC: 0	Message 44630 - Posted: 3 Dec 2010, 18:24:45 UTC - in response to Message 44628. So, there is some room for inprovement :-) The theoretical performance of doubles on ATI hardware is much higher than on Nvidia, so it's not expected to match that. It's expected to be about the same or marginally faster than the old CUDA application. ID: 44630 · Rating: 0 · rate: / Reply Quote

arkayn Send message Joined: 14 Feb 09 Posts: 999 Credit: 74,932,619 RAC: 0	Message 44631 - Posted: 3 Dec 2010, 18:30:19 UTC - in response to Message 44630. So, there is some room for inprovement :-) The theoretical performance of doubles on ATI hardware is much higher than on Nvidia, so it's not expected to match that. It's expected to be about the same or marginally faster than the old CUDA application. With the old CUDA app that Crunch3r fixed for Fermi cards I was running around 11 minutes for the 213 point units. ID: 44631 · Rating: 0 · rate: / Reply Quote

CTAPbIi Send message Joined: 4 Jan 10 Posts: 86 Credit: 51,753,924 RAC: 0	Message 44632 - Posted: 3 Dec 2010, 18:32:30 UTC yep, agree. dp is 1/8 of sp instead of 1/2... so, there are no changes in my plans to get pair of 6970 :-) when u plan to release OpenCL app for Ati? :-) ID: 44632 · Rating: 0 · rate: / Reply Quote

Werkstatt Send message Joined: 19 Feb 08 Posts: 350 Credit: 141,284,369 RAC: 0	Message 44636 - Posted: 3 Dec 2010, 21:27:07 UTC I have now a couple of wu's finished. 11min 34sec is a typical time for de_separation_16_3s_fix_1_1137162_... GTX460 @ 715MHz, win7-64, E8400, BM 6.10.58, Forceware 260.99 GPU-Usage ~98%, Mem usage 255MB (Afterburner) No invalid or errors logged (but they don't stay long..) ID: 44636 · Rating: 0 · rate: / Reply Quote

One World, One Dream Send message Joined: 26 Dec 09 Posts: 1 Credit: 615,993 RAC: 0	Message 44638 - Posted: 3 Dec 2010, 22:39:42 UTC I have tested the new OpenCL app with a Geforce GT 420m notebook GPU. When I used Crunch3er's optimized app in the past, my work units took either 49 or 73 minutes to complete. With the new OpenCL app, work units need either 51 or 75 minutes to complete. I do not know why the new app is actually a bit slower, maybe it is because I was surfing on the web and displaying some images while running the work units? (though simple tasks like this never seemed to affect the work units of the old app by Crunch3er) Furthermore, there is a significant difference in system responsiveness and a slight difference in GPU temperature between the two apps. While Crunch3er's app did not cause any negative effects regarding system responsiveness, the new OpenCL app causes the system to react very sluggishly, so that comfortably writing something or surfing the web is not possible anymore. Lastly, GPU temperature with the OpenCL app is about 3 degrees celsius higher than with Crunch3er's app. That is why I have now switched back to the old app for the time being. I hope my information gathered about the performance of the new app is helpful. ID: 44638 · Rating: 0 · rate: / Reply Quote

europa Send message Joined: 29 Oct 10 Posts: 89 Credit: 39,246,947 RAC: 0	Message 44644 - Posted: 4 Dec 2010, 1:00:55 UTC - in response to Message 44630. Matt, I followed the link above to extract the tar file into the MW sub-dir under boinc-client on /var however I am still getting the exact same error message about the app not finding a double-precision card even though it accurately identifies the Fermi. I'm on 64-bit Ubuntu. In addition, all of the WU's are id'd as cuda 23 units, there is no mention in the error message or in the WU log of Open CL. I notice that in the app_info.xml it refers to: <file_info> <name>milkyway_separation_0.48_x86_64-pc-linux-gnu__cuda_opencl</name> <executable/> </file_info> However, there is no such file. The only executable that it unpacked was: milkyway_0.24_x86_64-pc-linux-gnu__cuda23 Also, does: <coproc> <type>CUDA</type> <count>1</count> </coproc> refer to the number of processors or to its ID number? Mine always comes up as GPU0 which is why I ask. Thanks, Steve ID: 44644 · Rating: 0 · rate: / Reply Quote

Matt Arsenault Volunteer moderator Project developer Project tester Project scientist Send message Joined: 8 May 10 Posts: 576 Credit: 15,979,383 RAC: 0	Message 44645 - Posted: 4 Dec 2010, 1:11:20 UTC - in response to Message 44644. Matt, I followed the link above to extract the tar file into the MW sub-dir under boinc-client on /var however I am still getting the exact same error message about the app not finding a double-precision card even though it accurately identifies the Fermi. I'm on 64-bit Ubuntu. In addition, all of the WU's are id'd as cuda 23 units, there is no mention in the error message or in the WU log of Open CL. I notice that in the app_info.xml it refers to: milkyway_separation_0.48_x86_64-pc-linux-gnu__cuda_opencl However, there is no such file. The only executable that it unpacked was: milkyway_0.24_x86_64-pc-linux-gnu__cuda23 Well BOINC is rather eager to delete anything that isn't mentioned in any of the XML files. It looks like something else was wrong, and then this got deleted and it attempted to download and use the CUDA one. You might need to chown what you extract to boinc:boinc for it to work. It seems to be unhappy when the boinc user doesn't own the files. Also, does: CUDA 1 refer to the number of processors or to its ID number? It's the count of GPUs that will be used. The application only uses 1, so it should always be 1. ID: 44645 · Rating: 0 · rate: / Reply Quote

Matt Arsenault Volunteer moderator Project developer Project tester Project scientist Send message Joined: 8 May 10 Posts: 576 Credit: 15,979,383 RAC: 0	Message 44646 - Posted: 4 Dec 2010, 2:07:15 UTC - in response to Message 44645. Well BOINC is rather eager to delete anything that isn't mentioned in any of the XML files. It looks like something else was wrong, and then this got deleted and it attempted to download and use the CUDA one. You might need to chown what you extract to boinc:boinc for it to work. It seems to be unhappy when the boinc user doesn't own the files. Actually I just checked this. It doesn't need to be owned by boinc, but otherwise you need to be in the boinc group and the stuff needs to be group readable and executable. ID: 44646 · Rating: 0 · rate: / Reply Quote

tolafoph Send message Joined: 24 Nov 10 Posts: 1 Credit: 41,702 RAC: 0	Message 44654 - Posted: 4 Dec 2010, 10:14:19 UTC - in response to Message 44646. Hi, I´m new to milkyway@home, so I don´t know exactly how long the WUs took with the CUDA App, but here are the numbers for the new opencl app. Boinc 6.10.58, E6750@3.2GHz, GTX 260-216, driver 260.99, vista 64 1,689.29 sec 320.63 credits 914.38 sec 213.76 credits 960.59 sec 213.76 credits GPU usage @ ~90% ID: 44654 · Rating: 0 · rate: / Reply Quote

europa Send message Joined: 29 Oct 10 Posts: 89 Credit: 39,246,947 RAC: 0	Message 44657 - Posted: 4 Dec 2010, 12:38:46 UTC - in response to Message 44645. Matt, Thanks for the feedback. I see where I wasn't the owner for some of the files. I think that I've caught all of them. It sounds like I should purge the WU's in progress and re-extract the tar? Thanks, Steve ID: 44657 · Rating: 0 · rate: / Reply Quote

Evil Penguin Send message Joined: 9 Nov 09 Posts: 9 Credit: 19,728,072 RAC: 0	Message 44660 - Posted: 4 Dec 2010, 14:43:30 UTC Sorry to go a bit off topic, but will the ATi OpenCL version come out soon? ID: 44660 · Rating: 0 · rate: / Reply Quote

Travis Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0	Message 44664 - Posted: 4 Dec 2010, 16:22:00 UTC - in response to Message 44624. 615 secs??? wow... even if u run 2 WUs concurrent, it too much. my 4870 run it in 325 secs and 4890 in 312 secs. but taking in consideration dp cropped by nvidia in fermi cards... But anyway I'll try it on GTX275 nvidia GPUs aren't nearly as fast as the ATI GPUs for double precision calculations. So that's really not too bad. ID: 44664 · Rating: 0 · rate: / Reply Quote

Zeddicus Send message Joined: 30 May 10 Posts: 2 Credit: 2,351 RAC: 0	Message 44675 - Posted: 4 Dec 2010, 18:13:46 UTC - in response to Message 44664. I'm taking part in some cpu-based projects (like climateprediction.net and yoyo@home) and was looking for another project to run on my gpu (besides SETI). In the past milkyway told me that my gpu was lacking memory so i thought that maybe the OpenCL version would run. After installing the package and updating my NVIDIA driver to 260.99 I was happy to get some WUs - but they all ended up with "calculating error". - Okay, let's do it step by step... So at first I've updated Boinc to 6.10.58. Now milkyway says at start-up "Message from server: Your app_info.xml file doesn't have a version of MilkyWay@Home N-Body Simulation." Hmmh, what does that tell me? Did I make any mistake? Or do I need the formerly mentioned 3.2 cudatoolkit from that guy "Crunch3er"? Any help is appreciated! By the way: GeForce 8800 GTS (driver 26099, CUDA version 3020, compute capability 1.0). Greetings from Germany, Axel P.S.: Bad English? Maybe that's because I've left school 25 years ago... ID: 44675 · Rating: 0 · rate: / Reply Quote

Matt Arsenault Volunteer moderator Project developer Project tester Project scientist Send message Joined: 8 May 10 Posts: 576 Credit: 15,979,383 RAC: 0	Message 44676 - Posted: 4 Dec 2010, 18:44:15 UTC - in response to Message 44675. By the way: GeForce 8800 GTS (driver 26099, CUDA version 3020, compute capability 1.0). That GPU doesn't have doubles and won't work. It needs at least compute capability 1.3. ID: 44676 · Rating: 0 · rate: / Reply Quote

[AF>EDLS] Polynesia Send message Joined: 5 Apr 09 Posts: 71 Credit: 6,120,786 RAC: 0	Message 44677 - Posted: 4 Dec 2010, 18:45:41 UTC Last modified: 4 Dec 2010, 18:49:39 UTC app_info essayez avec ce fichier: <app_info> <app> <name>milkyway</name> </app> <file_info> <name>milkyway_0.45_windows_x86_64.exe</name> <executable/> </file_info> <app_version> <app_name>milkyway</app_name> <version_num>45</version_num> <file_ref> <file_name>milkyway_0.45_windows_x86_64.exe</file_name> <main_program/> </file_ref> </app_version> <app> <name>milkyway_nbody</name> <user_friendly_name>MilkyWay@Home nbody Simulation</user_friendly_name> </app> <file_info> <name>milkyway_nbody_0.21_windows_x86_64__sse2.exe</name> <executable/> </file_info> <app_version> <app_name>milkyway_nbody</app_name> <version_num>21</version_num> <file_ref> <file_name>milkyway_nbody_0.21_windows_x86_64__sse2.exe</file_name> <main_program/> </file_ref> </app_version> <app> <name>milkyway</name> <user_friendly_name>Milkyway@home Separation</user_friendly_name> </app> <file_info> <name>milkyway_separation_0.48_windows_intelx86__cuda_opencl.exe</name> <executable/> </file_info> <app_version> <app_name>milkyway</app_name> <version_num>47</version_num> <plan_class>cuda_opencl</plan_class> <avg_ncpus>0.05</avg_ncpus> <max_ncpus>0.05</max_ncpus> <flops>1.0e11</flops> <coproc> <type>CUDA</type> <count>1</count> </coproc> <file_ref> <file_name>milkyway_separation_0.48_windows_intelx86__cuda_opencl.exe</file_name> <main_program/> </file_ref> </app_version> <app> <name>milkyway</name> </app> <file_info> <name>milkyway_windows_intelx86__cuda23.exe</name> <executable/> </file_info> <file_info> <name>cudart.dll</name> <executable/> </file_info> <file_info> <name>cutil32.dll</name> <executable/> </file_info> <app_version> <app_name>milkyway</app_name> <version_num>24</version_num> <plan_class>cuda23</plan_class> <flops>1.0e11</flops> <avg_ncpus>0.1</avg_ncpus> <max_ncpus>0.1</max_ncpus> <coproc> <type>CUDA</type> <count>1.0</count> </coproc> <cmdline></cmdline> <file_ref> <file_name>milkyway_windows_intelx86__cuda23.exe</file_name> <main_program/> </file_ref> <file_ref> <file_name>cudart.dll</file_name> </file_ref> <file_ref> <file_name>cutil32.dll</file_name> </file_ref> </app_version> </app_info> Team Alliance francophone, boinc: 7.0.18 GA-P55-UD5, i7 860, Win 7 64 bits, 8g DDR3, GTX 470 ID: 44677 · Rating: 0 · rate: / Reply Quote