Error while computing again and again

Author	Message
Ano Send message Joined: 29 Nov 09 Posts: 1 Credit: 824,834 RAC: 0	Message 45823 - Posted: 27 Jan 2011, 16:02:43 UTC Hi, I recently updated my nvidia driver to work on 0.50 gpu,and it worked for some time,but now I get "error while computing" over and over again like before. It doesn't seem to come from the same problem though,since others are crashing with cpu versions of milkyway,or so it seems. Here are my latest workunits: http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=226592741 http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=226587309 http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=226576105 http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=226572017 Why are those workunits not working? ID: 45823 · Rating: 0 · rate: / Reply Quote

rcthardcore Send message Joined: 30 Dec 08 Posts: 30 Credit: 6,999,702 RAC: 0	Message 45833 - Posted: 27 Jan 2011, 21:44:32 UTC Same results with me too. Computation errors everytime. Somebody needs to check on this ASAP! ID: 45833 · Rating: 0 · rate: / Reply Quote

Len LE/GE Send message Joined: 8 Feb 08 Posts: 261 Credit: 104,050,322 RAC: 0	Message 45834 - Posted: 27 Jan 2011, 21:45:00 UTC The cuda version is known as having a problem with de_separation_23_3s WU's. The next version of the app should fix that. ID: 45834 · Rating: 0 · rate: / Reply Quote

Mark Gallaher Send message Joined: 12 Sep 07 Posts: 2 Credit: 10,025,948 RAC: 0	Message 45848 - Posted: 28 Jan 2011, 7:39:37 UTC For my two systems, I'm seeing: - GTX260 errors on the _23_3s_fix series - GTX460 success on the _23_3s_fix series I'll keep an eye on it, but so far it seems that might be a clue. My fermi card works but not the older one. ID: 45848 · Rating: 0 · rate: / Reply Quote

Bill Walker Send message Joined: 19 Aug 09 Posts: 23 Credit: 631,303 RAC: 0	Message 45868 - Posted: 29 Jan 2011, 3:15:39 UTC I've recently started N-body work on a CPU, after upgrading from an optimized ap. I'm seeing lots of computing errors as well, some after a long time crunching. Anyone have any ideas? http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=226730964 http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=226593122 http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=225091853 ID: 45868 · Rating: 0 · rate: / Reply Quote

XJR-Maniac Send message Joined: 18 Oct 07 Posts: 35 Credit: 4,684,314 RAC: 0	Message 45881 - Posted: 30 Jan 2011, 12:42:46 UTC Same issue for me on MW v0.50 with NVIDIA GTX 260. All tasks are running up to 100% and crash on finish line. Just updated driver to 266.58 on one machine with no avail. Checked my wingmen and it seems that MW v0.23 with ATI is crashing, too. Here's the log: <core_client_version>6.10.17</core_client_version> <![CDATA[ <message> Unzulässige Funktion. (0x1) - exit code 1 (0x1) </message> <stderr_txt> <search_application> milkywayathome separation 0.50 Windows x86 double OpenCL </search_application> Found 1 platforms Platform 0 information: Platform name: NVIDIA CUDA Platform version: OpenCL 1.0 CUDA 3.2.1 Platform vendor: Platform profile: Platform extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll Using device 0 on platform 0 Found 2 CL devices Device GeForce GTX 260 (NVIDIA Corporation:0x10de) Type: CL_DEVICE_TYPE_GPU Driver version: 266.58 Version: OpenCL 1.0 CUDA Compute capability: 1.3 Little endian: CL_TRUE Error correction: CL_FALSE Image support: CL_TRUE Address bits: 32 Max compute units: 27 Clock frequency: 1104 Mhz Global mem size: 939327488 Max mem alloc: 234831872 Global mem cache: 0 Cacheline size: 0 Local mem type: CL_LOCAL Local mem size: 16384 Max const args: 9 Max const buf size: 65536 Max parameter size: 4352 Max work group size: 512 Max work item dim: 3 Max work item sizes: { 512, 512, 64 } Mem base addr align: 2048 Min type align size: 128 Timer resolution: 1000 ns Double extension: MW_CL_KHR_FP64 Extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 Found a compute capability 1.3 device. Using -cl-nv-maxrregcount=32 Compiler flags: -cl-mad-enable -cl-no-signed-zeros -cl-strict-aliasing -cl-finite-math-only -DUSE_CL_MATH_TYPES=0 -DUSE_MAD=1 -DUSE_FMA=0 -cl-nv-verbose -cl-nv-maxrregcount=32 -DDOUBLEPREC=1 -DMILKYWAY_MATH_COMPILATION -DNSTREAM=3 -DFAST_H_PROB=1 -DAUX_BG_PROFILE=0 -DUSE_IMAGES=1 -DI_DONT_KNOW_WHY_THIS_DOESNT_WORK_HERE=0 Build status: CL_BUILD_SUCCESS Build log: : Considering profile 'compute_13' for gpu='sm_13' in 'cuModuleLoadDataEx_4' : Retrieving binary for 'cuModuleLoadDataEx_4', for gpu='sm_13', usage mode=' --verbose --maxrregcount 32 ' : Considering profile 'compute_13' for gpu='sm_13' in 'cuModuleLoadDataEx_4' : Control flags for 'cuModuleLoadDataEx_4' disable search path : Ptx binary found for 'cuModuleLoadDataEx_4', architecture='compute_13' : Ptx compilation for 'cuModuleLoadDataEx_4', for gpu='sm_13', ocg options=' --verbose --maxrregcount 32 ' ptxas info : Compiling entry function 'mu_sum_kernel' for 'sm_13' ptxas info : Used 32 registers, 800+0 bytes lmem, 48+16 bytes smem, 56 bytes cmem[1], 4 bytes cmem[2], 4 bytes cmem[3], 4 bytes cmem[4], 4 bytes cmem[5], 4 bytes cmem[6] Kernel work group info: Work group size = 512 Kernel local mem size = 64 Compile work group size = { 0, 0, 0 } Group size = 64, per CU = 8, threads per CU = 512 Block size = 13824 Desired = 163 Min sol: 163 13312 Lower n solution: n = 163, x = 13312 Higher n solution: n = 163, x = 13312 Using solution: n = 163, x = 13312 Range: { nu_steps = 640, mu_steps = 1600, r_steps = 1400 } Iteration area: 2240000 Chunk estimate: 163 Num chunks: 163 Added area: 13312 Effective area: 2253312 Integration time: 957.124801 s. Average time per iteration = 1495.507502 ms Kernel work group info: Work group size = 512 Kernel local mem size = 64 Compile work group size = { 0, 0, 0 } Group size = 64, per CU = 8, threads per CU = 512 Block size = 13824 Desired = 21 Min sol: 1 0 Min sol: 1 0 Min sol: 1 0 Min sol: 1 0 Min sol: 1 0 Min sol: 1 0 Min sol: 1 0 Min sol: 1 0 Min sol: 1 0 Min sol: 1 0 Min sol: 1 0 Min sol: 1 0 Min sol: 1 0 Min sol: 1 0 Min sol: 1 0 Min sol: 1 0 Min sol: 1 0 Min sol: 1 0 Min sol: 1 0 Min sol: 1 0 Didn't find a solution. Using fallback solution n = 20, x = 0 Using solution: n = 20, x = 0 Range: { nu_steps = 160, mu_steps = 400, r_steps = 700 } Iteration area: 280000 Chunk estimate: 21 Num chunks: 20 Added area: 0 Effective area: 280000 Global dimensions not divisible by local Failed to find good run sizes Failed to calculate integral 1 12:30:48 (2372): called boinc_finish </stderr_txt> ]]> Couldn't be a BOINC client version issue because the ATIs are crashing on 6.10.56 while I'm using 6.10.17. Will set my boxes to NNW until further notice. ID: 45881 · Rating: 0 · rate: / Reply Quote

Len LE/GE Send message Joined: 8 Feb 08 Posts: 261 Credit: 104,050,322 RAC: 0	Message 45883 - Posted: 30 Jan 2011, 14:01:45 UTC WU: de_separation_23_3s GPU: NVIDIA GTX 260 Looks like the problem with GTX2xx and WUs de_separation_23_3s Matt is aware of and going to fix in the next version. Other WUs should run on your GPU. The only WU with error I still could find in your list is Workunit 228247254 and there is an ATI card (HD5850?) that finished the WU and is waiting for validation (not chrashed). ID: 45883 · Rating: 0 · rate: / Reply Quote

XJR-Maniac Send message Joined: 18 Oct 07 Posts: 35 Credit: 4,684,314 RAC: 0	Message 45886 - Posted: 30 Jan 2011, 17:27:37 UTC - in response to Message 45883. Yes, you're right. They all have been de_separation_23_3s WUs, the last one crashed right now. So I will wait for the fix to come. In the meantime, there are other projects waiting for my GPUs ;-))) WU: de_separation_23_3s GPU: NVIDIA GTX 260 Looks like the problem with GTX2xx and WUs de_separation_23_3s Matt is aware of and going to fix in the next version. Other WUs should run on your GPU. The only WU with error I still could find in your list is Workunit 228247254 and there is an ATI card (HD5850?) that finished the WU and is waiting for validation (not chrashed). ID: 45886 · Rating: 0 · rate: / Reply Quote