Nvidia OpenCL updated

Author	Message
Matt Arsenault Volunteer moderator Project developer Project tester Project scientist Send message Joined: 8 May 10 Posts: 576 Credit: 15,979,383 RAC: 0	Message 46058 - Posted: 8 Feb 2011, 1:00:11 UTC I've updated the Nvidia/OpenCL application to 0.52 which should fix the failures on the 23* tasks. ID: 46058 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 24 Jan 11 Posts: 738 Credit: 566,066,763 RAC: 13,027	Message 46067 - Posted: 8 Feb 2011, 20:36:35 UTC - in response to Message 46058. Matt, how do I find the name of the new 0.52 OpenCL app so I can update my app_info file to download it? I see it listed in the project apps list but can't figure out how to get it without reverting back to no app_info. I use app_info to change my count to .5. Thanks, Keith ID: 46067 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 24 Jan 11 Posts: 738 Credit: 566,066,763 RAC: 13,027	Message 46068 - Posted: 8 Feb 2011, 20:50:17 UTC - in response to Message 46067. Never mind, I found the download directory. Keith ID: 46068 · Rating: 0 · rate: / Reply Quote

Paul Sands Send message Joined: 6 Oct 07 Posts: 1 Credit: 86,122,523 RAC: 35,775	Message 46069 - Posted: 8 Feb 2011, 21:04:17 UTC All my linux hosts seem to be failing all tasks with the new 0.52 OpenCL app. I have set them to no new work. So far my Windows hosts are doing fine with the new 0.52 OpenCL app. ID: 46069 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 24 Jan 11 Posts: 738 Credit: 566,066,763 RAC: 13,027	Message 46070 - Posted: 8 Feb 2011, 21:23:51 UTC - in response to Message 46069. Yes, I am having computation errors with all tasks using the new .52 Linux OpenCL app also. Will revert back to the .50 app until it gets figured out. Keith ID: 46070 · Rating: 0 · rate: / Reply Quote

[AF>EDLS]GuL Send message Joined: 5 Jun 08 Posts: 21 Credit: 245,803,013 RAC: 0	Message 46071 - Posted: 8 Feb 2011, 23:04:24 UTC - in response to Message 46070. I have also the same problem : all the wu 0.52 are failing on my linux ubuntu 10.10 host http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=66288 I am using nvidia 270.18 x64 with a GTX260, boinc 6.10.58 x64 and have reseted the project. Any help, please ? ID: 46071 · Rating: 0 · rate: / Reply Quote

Matt Arsenault Volunteer moderator Project developer Project tester Project scientist Send message Joined: 8 May 10 Posts: 576 Credit: 15,979,383 RAC: 0	Message 46072 - Posted: 8 Feb 2011, 23:07:56 UTC - in response to Message 46069. All my linux hosts seem to be failing all tasks with the new 0.52 OpenCL app. I made a really dumb mistake in the Linux build. Should be fixed now (0.54). ID: 46072 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 24 Jan 11 Posts: 738 Credit: 566,066,763 RAC: 13,027	Message 46076 - Posted: 9 Feb 2011, 7:09:35 UTC - in response to Message 46072. Matt, thanks for making the new build so quickly and fixing the problem. Running 0.54 quite well now. Keith ID: 46076 · Rating: 0 · rate: / Reply Quote

europa Send message Joined: 29 Oct 10 Posts: 89 Credit: 39,246,947 RAC: 0	Message 46085 - Posted: 9 Feb 2011, 10:12:52 UTC - in response to Message 46072. Matt, Thanks for the update. I too was having huge problems. Looking forward to producing good WUs again. Regards, Steve Ubuntu 10.04 ID: 46085 · Rating: 0 · rate: / Reply Quote

Landon Oswalt Send message Joined: 8 Jan 11 Posts: 1 Credit: 1,584,923 RAC: 0	Message 46103 - Posted: 9 Feb 2011, 20:40:03 UTC Matt, Im running 052 ver on 2 gtx 460's and for some reason the cards will not finish the unit? it will keep working untill i close bonic. one unit worked up untill 175%? i could complete a unit in 12 min, now hours. Any ideas how to fix this problem? happy crunching, Landon Oswalt ID: 46103 · Rating: 0 · rate: / Reply Quote

Matt Arsenault Volunteer moderator Project developer Project tester Project scientist Send message Joined: 8 May 10 Posts: 576 Credit: 15,979,383 RAC: 0	Message 46107 - Posted: 9 Feb 2011, 20:53:56 UTC - in response to Message 46103. Matt, Im running 052 ver on 2 gtx 460's and for some reason the cards will not finish the unit? it will keep working untill i close bonic. one unit worked up untill 175%? i could complete a unit in 12 min, now hours. Any ideas how to fix this problem? A bunch of workunits were started which were way too big and taking too long on CPUs and many weaker GPUs. The total number of steps in the progress calculation was overflowing the 32 bit limit and wrapping around, causing progress bars to go over 100%. These should go away soon. ID: 46107 · Rating: 0 · rate: / Reply Quote

europa Send message Joined: 29 Oct 10 Posts: 89 Credit: 39,246,947 RAC: 0	Message 46113 - Posted: 9 Feb 2011, 23:02:50 UTC - in response to Message 46107. I'm still having major problems on my machines (Ubuntu 10.04 with GTX460 cards). Up until about 5 days ago, things were running great, the cards were switching off with Einstein running 2 WU's simultaneously. Life was good. Now, despite detaching and re=attaching through BoincStats and doing anything else I can think of, I can't even get work units or the apps downloaded. The one machine that had a big backlog of v .50 WU's was unaffected but has now finished all of those WUs. It has been showing 3 WUs for nbodySim 0.21 for days in "downloading" status, but nothing has come through. Einstein is having a field day with all of the GPUs to itself. Is there anything that I can do from here to get things going again with MW? Thanks for the help. Regards, Steve ID: 46113 · Rating: 0 · rate: / Reply Quote

Jesse Viviano Send message Joined: 4 Feb 11 Posts: 86 Credit: 60,913,150 RAC: 0	Message 46114 - Posted: 9 Feb 2011, 23:14:34 UTC - in response to Message 46113. I'm still having major problems on my machines (Ubuntu 10.04 with GTX460 cards). Up until about 5 days ago, things were running great, the cards were switching off with Einstein running 2 WU's simultaneously. Life was good. Now, despite detaching and re=attaching through BoincStats and doing anything else I can think of, I can't even get work units or the apps downloaded. The one machine that had a big backlog of v .50 WU's was unaffected but has now finished all of those WUs. It has been showing 3 WUs for nbodySim 0.21 for days in "downloading" status, but nothing has come through. Einstein is having a field day with all of the GPUs to itself. Is there anything that I can do from here to get things going again with MW? Thanks for the help. Regards, Steve If you checked the server status page, you will have noticed that there are no work units to download. Something is messed up with the server at this moment. ID: 46114 · Rating: 0 · rate: / Reply Quote

europa Send message Joined: 29 Oct 10 Posts: 89 Credit: 39,246,947 RAC: 0	Message 46137 - Posted: 10 Feb 2011, 17:41:05 UTC - in response to Message 46114. Well, at least I won't be sending back bad WU's anymore due to computational errors. Hope it all gets sorted out.Judging from the various postings, it sounds like multiple unrelated problems. Regards, Steve ID: 46137 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 24 Jan 11 Posts: 738 Credit: 566,066,763 RAC: 13,027	Message 46139 - Posted: 10 Feb 2011, 19:08:03 UTC - in response to Message 46137. I just switched back over to the Linux side and now have 14 tasks that exited with a compute error on the new 0.54 Linux OpenCL app. Could this be because of the recent incorrectly sized work that was sent out? Here is a shortened result from a task that errored out: <core_client_version>6.12.12</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255) </message> <stderr_txt> <search_application> milkywayathome separation 0.54 Linux x86_64 double OpenCL </search_application> Found 1 platforms Platform 0 information: Platform name: NVIDIA CUDA Platform version: OpenCL 1.0 CUDA 3.2.1 Platform vendor: Platform profile: Platform extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll Using device 0 on platform 0 Found 1 CL devices Device GeForce GTX 460 (NVIDIA Corporation:0x10de) Type: CL_DEVICE_TYPE_GPU Driver version: 260.19.36 Version: OpenCL 1.0 CUDA Compute capability: 2.1 Little endian: CL_TRUE Error correction: CL_FALSE Image support: CL_TRUE Address bits: 32 Max compute units: 7 Clock frequency: 1430 Mhz Global mem size: 1072889856 Max mem alloc: 268222464 Global mem cache: 114688 Cacheline size: 128 Local mem type: CL_LOCAL Local mem size: 49152 Max const args: 9 Max const buf size: 65536 Max parameter size: 4352 Max work group size: 1024 Max work item dim: 3 Max work item sizes: { 1024, 1024, 64 } Mem base addr align: 4096 Min type align size: 128 Timer resolution: 1000 ns Double extension: MW_CL_KHR_FP64 Extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 Compiler flags: -cl-mad-enable -cl-no-signed-zeros -cl-strict-aliasing -cl-finite-math-only -DUSE_CL_MATH_TYPES=0 -DUSE_MAD=1 -DUSE_FMA=0 -cl-nv-verbose -DDOUBLEPREC=1 -DMILKYWAY_MATH_COMPILATION -DNSTREAM=1 -DFAST_H_PROB=1 -DAUX_BG_PROFILE=0 -DUSE_IMAGES=1 -DI_DONT_KNOW_WHY_THIS_DOESNT_WORK_HERE=1 Build status: CL_BUILD_SUCCESS Build log: : Considering profile 'compute_20' for gpu='sm_21' in 'cuModuleLoadDataEx_4' Kernel work group info: Work group size = 576 Kernel local mem size = 0 Compile work group size = { 0, 0, 0 } Group size = 64, per CU = 32, threads per CU = 2048 Block size = 14336 Desired = 367 Min sol: 1 0 Min sol: 1 0 Min sol: 1 0 Min sol: 1 0 Didn't find a solution. Using fallback solution n = 375, x = 0 Using solution: n = 375, x = 0 Range: { nu_steps = 1500, mu_steps = 3500, r_steps = 3000 } Iteration area: 10500000 Chunk estimate: 367 Num chunks: 375 Added area: 0 Effective area: 10500000 Block size: 14336 Global dimensions not divisible by local Failed to find good run sizes Failed to finish: CL_INVALID_COMMAND_QUEUE Failed to run nu step: CL_INVALID_COMMAND_QUEUE Failed to calculate integral 0 02:49:02 (2522): called boinc_finish </stderr_txt> ]]> ID: 46139 · Rating: 0 · rate: / Reply Quote

Roel Send message Joined: 4 Aug 08 Posts: 1 Credit: 526,155 RAC: 0	Message 46217 - Posted: 13 Feb 2011, 14:46:57 UTC All my 0.52 (cuda_opencl) WU's end within one or at most some seconds with a computation error. My laptop uses a NVIDIA Geforce GT445M. What can be wrong? ID: 46217 · Rating: 0 · rate: / Reply Quote

DanNeely Send message Joined: 6 Oct 09 Posts: 39 Credit: 78,881,405 RAC: 0	Message 46227 - Posted: 13 Feb 2011, 22:41:01 UTC I'm seeing 100% failure with win7-64 and GTX260s/ ID: 46227 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 24 Jan 11 Posts: 738 Credit: 566,066,763 RAC: 13,027	Message 46229 - Posted: 14 Feb 2011, 2:03:56 UTC - in response to Message 46227. Other than the few WU that were too large and errored out, I seem to be running OpenCL WU on my 64 bit Linux 0.54 app successfully. Just looked at two WU, just reported and they validated. Maybe Matt needs to look at the build of the 0.52 Windows app and see if he missed something like the obvious goof he made on the Linux 0.52 app. Keith ID: 46229 · Rating: 0 · rate: / Reply Quote

Dirk Sadowski Send message Joined: 30 Apr 09 Posts: 101 Credit: 29,874,293 RAC: 0	Message 46231 - Posted: 14 Feb 2011, 3:39:12 UTC Ohh.. a pity, MW@h canceled the CUDA apps? Now OpenCL? AFAIK, at least 197.x nVIDIA driver needed. But, my machines need to stay with 190.38 which give the best performance @ S@h/stock CUDA23 app. I tested one MW@h WU with the new OpenCL app - immediately error. But, why got my machine with 190.38 the OpenCL app? It's not possible (via server) to send out the OpenCL app only to hosts with at least 197.x nVIDIA driver? If there are a lot of < 197.x driver hosts out there, wasted project server performance. DL/errors, DL/errors and DL/errors.. It's possible via app_info.xml to use the old (IIRC 0.24 CUDA23) app? Or this app don't work with the new WUs? BTW. In past I saw the german translation of this site. Since a few months only english. It's a mistake or wanted? ID: 46231 · Rating: 0 · rate: / Reply Quote

Werkstatt Send message Joined: 19 Feb 08 Posts: 350 Credit: 141,284,369 RAC: 0	Message 46234 - Posted: 14 Feb 2011, 6:13:10 UTC - in response to Message 46231. Ohh.. a pity, MW@h canceled the CUDA apps? It's possible via app_info.xml to use the old (IIRC 0.24 CUDA23) app? Or this app don't work with the new WUs? Please read this http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=1505&nowrap=true#46230 Cuda should still work. ID: 46234 · Rating: 0 · rate: / Reply Quote