| log in |
Message boards : News : OpenCL for Nvidia available for testing
| Author | Message |
|---|---|
|
The OpenCL application for Nvidia GPUs is ready for testing for Windows and Linux x86_64. I'm particularly interested in the performance / responsiveness tradeoff on mid-low range GPUs. | |
| ID: 44588 | Rating: 0 | rate:
| |
|
@ Matt - do you have a link for the download or will it download automatically as a stock app? | |
| ID: 44590 | Rating: 0 | rate:
| |
@ Matt - do you have a link for the download or will it download automatically as a stock app?You have to manually download from the links I posted. I'm not putting it up as stock yet. I think the server still needs to be updated for that to happen, and I want to hear if the system responsiveness is a problem on some lower end GPUs. | |
| ID: 44591 | Rating: 0 | rate:
| |
|
I just installed it on my GTX460 host and will see what it does on there. | |
| ID: 44592 | Rating: 0 | rate:
| |
|
Just tried to run that version on my GTX260. | |
| ID: 44604 | Rating: 0 | rate:
| |
|
I just downloaded and installed it. Fails immediate. Is there anything more to do? | |
| ID: 44606 | Rating: 0 | rate:
| |
|
Hello, | |
| ID: 44607 | Rating: 0 | rate:
| |
|
Thanks a lot, Matt :-) I'll try it later 2day on ubuntu x64. | |
| ID: 44608 | Rating: 0 | rate:
| |
|
If I remember correctly, you will need at least a 26x.00 driver to use the openCL app. | |
| ID: 44609 | Rating: 0 | rate:
| |
|
BTW, Matt, can I run OpenCL app on GTX275 and which driver's and CUDA versions I need? I've got 195.30 and cuda 2.3 | |
| ID: 44611 | Rating: 0 | rate:
| |
If I remember correctly, you will need at least a 26x.00 driver to use the openCL app. Good idea ... THX, seems running now. | |
| ID: 44612 | Rating: 0 | rate:
| |
BTW, Matt, can I run OpenCL app on GTX275 and which driver's and CUDA versions I need? I've got 195.30 and cuda 2.3The minimum driver which is supposed to work is 197.13, but I've only actually tested with the latest drivers. | |
| ID: 44613 | Rating: 0 | rate:
| |
BTW, Matt, can I run OpenCL app on GTX275 and which driver's and CUDA versions I need? I've got 195.30 and cuda 2.3 http://www.nvidia.com/object/linux-display-amd64-260.19.21-driver.html ____________ | |
| ID: 44614 | Rating: 0 | rate:
| |
|
Thansk, guys :-) | |
| ID: 44615 | Rating: 0 | rate:
| |
Just tried to run that version on my GTX260.The workunit is gone now, so I can't tell. | |
| ID: 44616 | Rating: 0 | rate:
| |
|
Is anyone running it in a system with both ATI and Nvidia drivers installed? I just realized a hypothetical problem that might happen. | |
| ID: 44617 | Rating: 0 | rate:
| |
|
ok, I understand - I have to move to CUDA 3.0 at least. | |
| ID: 44618 | Rating: 0 | rate:
| |
|
Why in the stats on the project account, the stats of these test units do not stay posted? | |
| ID: 44619 | Rating: 0 | rate:
| |
If I remember correctly, you will need at least a 26x.00 driver to use the openCL app. Upgrading the driver did the trick, you were right on this. I had a version beneath 260.x. Thanx for that hint! The WUs are running fine now. :-) ____________ Member of BOINC@Heidelberg and ATA! My BOINCstats | |
| ID: 44620 | Rating: 0 | rate:
| |
Why in the stats on the project account, the stats of these test units do not stay posted?I don't know. I consistently have a problem with this. | |
| ID: 44621 | Rating: 0 | rate:
| |
The WUs are running fine now. :-) how many secs on which card? ____________ | |
| ID: 44622 | Rating: 0 | rate:
| |
|
Here is the stderr that came up on my 460. Run time 614.917938 CPU time 31.6875 stderr out <core_client_version>6.12.6</core_client_version> <![CDATA[ <stderr_txt> <search_application> milkywayathome separation 0.48 Windows x86 double OpenCL </search_application> Found 1 platforms Platform 0 information: Platform name: NVIDIA CUDA Platform version: OpenCL 1.0 CUDA 3.2.1 Platform vendor: Platform profile: Platform extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll Using device 0 on platform 0 Found 1 CL devices Device GeForce GTX 460 (NVIDIA Corporation:0x10de) Type: CL_DEVICE_TYPE_GPU Driver version: 263.06 Version: OpenCL 1.0 CUDA Compute capability: 2.1 Little endian: CL_TRUE Error correction: CL_FALSE Image support: CL_TRUE Address bits: 32 Max compute units: 7 Clock frequency: 1600 Mhz Global mem size: 804847616 Max mem alloc: 201211904 Global mem cache: 114688 Cacheline size: 128 Local mem type: CL_LOCAL Local mem size: 49152 Max const args: 9 Max const buf size: 65536 Max parameter size: 4352 Max work group size: 1024 Max work item dim: 3 Max work item sizes: { 1024, 1024, 64 } Mem base addr align: 4096 Min type align size: 128 Timer resolution: 1000 ns Double extension: MW_CL_KHR_FP64 Extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 Compiler flags: -cl-mad-enable -cl-no-signed-zeros -cl-strict-aliasing -cl-finite-math-only -DUSE_CL_MATH_TYPES=0 -DUSE_MAD=0 -DUSE_FMA=0 -cl-nv-verbose -DDOUBLEPREC=1 -DMILKYWAY_MATH_COMPILATION -DNSTREAM=3 -DFAST_H_PROB=1 -DAUX_BG_PROFILE=0 -DUSE_IMAGES=1 -DI_DONT_KNOW_WHY_THIS_DOESNT_WORK_HERE=1 Build status: CL_BUILD_SUCCESS Build log: : Considering profile 'compute_20' for gpu='sm_21' in 'cuModuleLoadDataEx_4' Kernel work group info: Work group size = 512 Kernel local mem size = 0 Compile work group size = { 0, 0, 0 } Lower n solution: n = 40, x = 0 Higher n solution: n = 40, x = 0 Using solution: n = 40, x = 0 Range: { nu_steps = 640, mu_steps = 1600, r_steps = 1400 } Iteration area: 2240000 Chunk estimate: 40 Num chunks: 40 Added area: 0 Effective area: 2240000 Integration time: 584.577879 s. Average time per iteration = 913.402936 ms <background_integral> 0.00057622614096211838 </background_integral> <stream_integrals> 105.68890505238272000000 172.60556751830708000000 161.27537590087846000000 </stream_integrals> <background_only_likelihood> -3.29958249045773530000 </background_only_likelihood> <stream_only_likelihood> -37.34692154949922100000 -4.75605645991618700000 -3.79966623105932080000 </stream_only_likelihood> <search_likelihood> -3.00484287881831240000 </search_likelihood> 10:44:59 (2892): called boinc_finish </stderr_txt> ]]> Validate state Valid Claimed credit 0.177164871378224 Granted credit 213.760359413782 ____________ | |
| ID: 44623 | Rating: 0 | rate:
| |
|
615 secs??? wow... even if u run 2 WUs concurrent, it too much. my 4870 run it in 325 secs and 4890 in 312 secs. but taking in consideration dp cropped by nvidia in fermi cards... | |
| ID: 44624 | Rating: 0 | rate:
| |
|
Yes I know, my 5830 runs the same type of unit in 130 seconds. | |
| ID: 44625 | Rating: 0 | rate:
| |
|
So, there is room for improvement :-) | |
| ID: 44628 | Rating: 0 | rate:
| |
So, there is some room for inprovement :-)The theoretical performance of doubles on ATI hardware is much higher than on Nvidia, so it's not expected to match that. It's expected to be about the same or marginally faster than the old CUDA application. | |
| ID: 44630 | Rating: 0 | rate:
| |
So, there is some room for inprovement :-)The theoretical performance of doubles on ATI hardware is much higher than on Nvidia, so it's not expected to match that. It's expected to be about the same or marginally faster than the old CUDA application. With the old CUDA app that Crunch3r fixed for Fermi cards I was running around 11 minutes for the 213 point units. ____________ | |
| ID: 44631 | Rating: 0 | rate:
| |
|
yep, agree. dp is 1/8 of sp instead of 1/2... so, there are no changes in my plans to get pair of 6970 :-) | |
| ID: 44632 | Rating: 0 | rate:
| |
|
I have now a couple of wu's finished. | |
| ID: 44636 | Rating: 0 | rate:
| |
|
I have tested the new OpenCL app with a Geforce GT 420m notebook GPU. | |
| ID: 44638 | Rating: 0 | rate:
| |
|
Matt, | |
| ID: 44644 | Rating: 0 | rate:
| |
Matt,Well BOINC is rather eager to delete anything that isn't mentioned in any of the XML files. It looks like something else was wrong, and then this got deleted and it attempted to download and use the CUDA one. You might need to chown what you extract to boinc:boinc for it to work. It seems to be unhappy when the boinc user doesn't own the files. Also, does:It's the count of GPUs that will be used. The application only uses 1, so it should always be 1. | |
| ID: 44645 | Rating: 0 | rate:
| |
Well BOINC is rather eager to delete anything that isn't mentioned in any of the XML files. It looks like something else was wrong, and then this got deleted and it attempted to download and use the CUDA one. You might need to chown what you extract to boinc:boinc for it to work. It seems to be unhappy when the boinc user doesn't own the files.Actually I just checked this. It doesn't need to be owned by boinc, but otherwise you need to be in the boinc group and the stuff needs to be group readable and executable. | |
| ID: 44646 | Rating: 0 | rate:
| |
|
Hi, | |
| ID: 44654 | Rating: 0 | rate:
| |
|
Matt, | |
| ID: 44657 | Rating: 0 | rate:
| |
|
Sorry to go a bit off topic, but will the ATi OpenCL version come out soon? | |
| ID: 44660 | Rating: 0 | rate:
| |
615 secs??? wow... even if u run 2 WUs concurrent, it too much. my 4870 run it in 325 secs and 4890 in 312 secs. but taking in consideration dp cropped by nvidia in fermi cards... nvidia GPUs aren't nearly as fast as the ATI GPUs for double precision calculations. So that's really not too bad. ____________ | |
| ID: 44664 | Rating: 0 | rate:
| |
|
I'm taking part in some cpu-based projects (like climateprediction.net and yoyo@home) and was looking for another project to run on my gpu (besides SETI). In the past milkyway told me that my gpu was lacking memory so i thought that maybe the OpenCL version would run. After installing the package and updating my NVIDIA driver to 260.99 I was happy to get some WUs - but they all ended up with "calculating error". - Okay, let's do it step by step... So at first I've updated Boinc to 6.10.58. Now milkyway says at start-up "Message from server: Your app_info.xml file doesn't have a version of MilkyWay@Home N-Body Simulation." | |
| ID: 44675 | Rating: 0 | rate:
| |
By the way: GeForce 8800 GTS (driver 26099, CUDA version 3020, compute capability 1.0).That GPU doesn't have doubles and won't work. It needs at least compute capability 1.3. | |
| ID: 44676 | Rating: 0 | rate:
| |
|
app_info essayez avec ce fichier: | |
| ID: 44677 | Rating: 0 | rate:
| |
By the way: GeForce 8800 GTS (driver 26099, CUDA version 3020, compute capability 1.0).That GPU doesn't have doubles and won't work. It needs at least compute capability 1.3. Thanks for the information! But why did milkyway sent me any WUs then? | |
| ID: 44678 | Rating: 0 | rate:
| |
If you tried manually installing this, it will try sending the workunits to it. You shouldn't get sent the Nvidia applications since you don't have doubles.By the way: GeForce 8800 GTS (driver 26099, CUDA version 3020, compute capability 1.0).That GPU doesn't have doubles and won't work. It needs at least compute capability 1.3. | |
| ID: 44681 | Rating: 0 | rate:
| |
Is anyone running it in a system with both ATI and Nvidia drivers installed? I just realized a hypothetical problem that might happen. I have now a system running which has GTX460 and HD4850, running both MW and Collatz on both GPU's. No problems seen in the last 12 hours. http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=168415 | |
| ID: 44682 | Rating: 0 | rate:
| |
If you tried manually installing this, it will try sending the workunits to it. You shouldn't get sent the Nvidia applications since you don't have doubles.By the way: GeForce 8800 GTS (driver 26099, CUDA version 3020, compute capability 1.0).That GPU doesn't have doubles and won't work. It needs at least compute capability 1.3. to be honest, this is not the way boinc should be set to work. In your server, you should analyze the host requiring work and even if he has installed the app manually, the server should not send work because the host didn't satisfy all the requirements. If you leave it in this way it is really easy to trick your server into doing really bad things! | |
| ID: 44683 | Rating: 0 | rate:
| |
to be honest, this is not the way boinc should be set to work.This is related to the problem I think is most annoying in BOINC. I think everything involving the version and capability management in BOINC should be better; pretty much everything about app_info.xml and the scheduling isn't user friendly and is inflexible. The plan class system is inflexible; adding versions with different system requirements involves modifying the server code and it isn't composable for different features. app_info.xml doesn't handle updates and requires far too much manual intervention for what most people want. You might only want to run GPU workunits, or only N-body on the CPU and separation on the GPU, but right now you can't do that. You can say no CPU or no GPU, but not on a per application basis. You have to manually download files, put in the same information in several places in app_info.xml, and then you're stuck with whatever version you happened to install unless you go out of your way to update it. There should be a finer grain way of telling the server what capabilities different applications and possibly workunits require, and on the client there should be a way of specifying capabilities you want to use with an actual interface of some sort beyond just stating that manually installed application X should be used in an XML file, and it should handle updates automatically. I kind of want to work on a replacement that's more intelligent, but I don't really have the time. | |
| ID: 44684 | Rating: 0 | rate:
| |
|
OK Matt, if you're still interested, your new app takes a little over 17 minutes to complete 1 opencl wu. This is with Seti@Home working on both cpus (e6600@2.4GHz) with Folding@home working also. Windows XP32, Nvidia driver 206.63. | |
| ID: 44686 | Rating: 0 | rate:
| |
|
The update I've posted should help with better errors when the GPU doesn't have doubles, and should help with system responsiveness on lower end GPUs. | |
| ID: 44690 | Rating: 0 | rate:
| |
|
What updates? an updated application? | |
| ID: 44716 | Rating: 0 | rate:
| |
What updates? an updated application?The second set of links I added to the original post. | |
| ID: 44717 | Rating: 0 | rate:
| |
|
ok, thank you but I did not understand what the minor updates of the application? | |
| ID: 44718 | Rating: 0 | rate:
| |
|
With 48.0 I found that the runtimes are over 2m longer than with the CUDA app on my GTX260. 15:40 - 16:00 for OpenCL and 13:22 - 13:31 for CUDA. | |
| ID: 44749 | Rating: 0 | rate:
| |
With 48.0 I found that the runtimes are over 2m longer than with the CUDA app on my GTX260. 15:40 - 16:00 for OpenCL and 13:22 - 13:31 for CUDA.Are you sure you're comparing the same things? Some of the newer workunits are quite a bit larger than they have been in the past. If it's not that, I there might be some mysterious problem I haven't quite figured out where there's a mysterious drop in performance at some points with how I break the problem up to keep the system responsive. There seemed to be strange peaks in run time at some points I try, and I haven't quite figured out a good rule for different GPUs. I think I had something, but I haven't actually played with it on slower GPUs. For me on the 285, it seems to be about 2% faster than the CUDA one. The memory bandwith used on the card is up to about 50% as well which is a HUGE jump from CUDA (0%). As well as having a larger system memory footprint. Just found that interesting, it doesn't particularly concern me.I don't see why that would happen. There's basically no transfer done except at the beginning / end. And it also uses more CPU time than the CUDA app, though I didn't notice the cpu being used while I was watching the task manager. Does it use most / all of the cycles at the start or end of the WU or something?Quite likely. I think the old CUDA one did the final likelihood calculation on the GPU, which takes a few seconds at the end on the CPU, but isn't actually worth the effort to do it on the GPU. | |
| ID: 44754 | Rating: 0 | rate:
| |
|
Matt, | |
| ID: 44755 | Rating: 0 | rate:
| |
I downloaded the WU's all today and within about an hour of each other. I did 4 OpenCL then 3 CUDA before posting and another few CUDA after posting (With the same type of run time seen).The current workunits aren't uniformly larger. There is a mixture of different sizes out right now, so it's hard to tell if this means anything without knowing which workunits you ran on each. | |
| ID: 44757 | Rating: 0 | rate:
| |
|
After I posted the last one I figured you would need more details since the deleter is deleting things right away. (Looks like I was right) So I went about getting them for you. | |
| ID: 44761 | Rating: 0 | rate:
| |
|
Matt, | |
| ID: 44805 | Rating: 0 | rate:
| |
|
Thanks to everyone you posted information. I've looked at the pieces, and I think I've pieced together why I made the slower GPUs slower; I half missed something obvious. | |
| ID: 44812 | Rating: 0 | rate:
| |
|
I've posted another minor update which should hopefully fix being slower on some GPUs. | |
| ID: 44820 | Rating: 0 | rate:
| |
|
Big trouble!! | |
| ID: 44853 | Rating: 0 | rate:
| |
|
Matt, | |
| ID: 44859 | Rating: 0 | rate:
| |
|
You are copying the app_info.xml file as well I hope, that is what tells BOINC what app to use with the project. | |
| ID: 44860 | Rating: 0 | rate:
| |
|
Matt, | |
| ID: 44863 | Rating: 0 | rate:
| |
I'm trying now the 0.48.2 and I am having slowdowns my PC that I was not with 0.48.1 ... In addition, the unit does not and this calculation is boosting the temperature of my card ..Apparently the formula that I used to keep the system responsive fell apart for the 470's specifications. The temperature is most definitely expected to go up. | |
| ID: 44866 | Rating: 0 | rate:
| |
|
What are you using to open the .xml files, it almost looks like a browser. <app_info> <app> <name>milkyway</name> <user_friendly_name>Milkyway@home Separation</user_friendly_name> </app> <file_info> <name>milkyway_separation_0.48.2_x86_64-pc-linux-gnu__cuda_opencl</name> <executable/> </file_info> <app_version> <app_name>milkyway</app_name> <version_num>48</version_num> <plan_class>cuda_opencl</plan_class> <avg_ncpus>0.05</avg_ncpus> <max_ncpus>0.05</max_ncpus> <flops>1.0e11</flops> <coproc> <type>CUDA</type> <count>1</count> </coproc> <file_ref> <file_name>milkyway_separation_0.48.2_x86_64-pc-linux-gnu__cuda_opencl</file_name> <main_program/> </file_ref> </app_version> </app_info> ____________ | |
| ID: 44870 | Rating: 0 | rate:
| |
|
It does. | |
| ID: 44887 | Rating: 0 | rate:
| |
|
Matt, | |
| ID: 44890 | Rating: 0 | rate:
| |
But, on the problem machine it is missing the reference line to "app_info.xml; using anonymous platform" at that spot.Are you sure you're putting it in the right place? It needs to be in the milkyway directory under projects. I just saw your posting about the new version being pushed out by the server and the removal of the cuda version. I'm hoping that will fix things on this second machine.The CUDA removal hasn't happened yet. | |
| ID: 44893 | Rating: 0 | rate:
| |
|
Matt, | |
| ID: 44898 | Rating: 0 | rate:
| |
|
Matt, | |
| ID: 44919 | Rating: 0 | rate:
| |
Message boards :
News :
OpenCL for Nvidia available for testing