Welcome to MilkyWay@home

Error while computing again and again

Message boards : Number crunching : Error while computing again and again
Message board moderation

To post messages, you must log in.

AuthorMessage
Ano

Send message
Joined: 29 Nov 09
Posts: 1
Credit: 824,834
RAC: 0
Message 45823 - Posted: 27 Jan 2011, 16:02:43 UTC

Hi,

I recently updated my nvidia driver to work on 0.50 gpu,and it worked for some time,but now I get "error while computing" over and over again like before.

It doesn't seem to come from the same problem though,since others are crashing with cpu versions of milkyway,or so it seems.

Here are my latest workunits:
http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=226592741
http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=226587309
http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=226576105
http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=226572017

Why are those workunits not working?
ID: 45823 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
rcthardcore

Send message
Joined: 30 Dec 08
Posts: 30
Credit: 6,999,702
RAC: 0
Message 45833 - Posted: 27 Jan 2011, 21:44:32 UTC

Same results with me too. Computation errors everytime. Somebody needs to check on this ASAP!
ID: 45833 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Len LE/GE

Send message
Joined: 8 Feb 08
Posts: 261
Credit: 104,050,322
RAC: 0
Message 45834 - Posted: 27 Jan 2011, 21:45:00 UTC

The cuda version is known as having a problem with de_separation_23_3s WU's.
The next version of the app should fix that.
ID: 45834 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mark Gallaher

Send message
Joined: 12 Sep 07
Posts: 2
Credit: 10,025,948
RAC: 0
Message 45848 - Posted: 28 Jan 2011, 7:39:37 UTC

For my two systems, I'm seeing:

- GTX260 errors on the _23_3s_fix series
- GTX460 success on the _23_3s_fix series

I'll keep an eye on it, but so *far* it seems that might be a clue. My fermi card works but not the older one.
ID: 45848 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bill Walker

Send message
Joined: 19 Aug 09
Posts: 23
Credit: 631,303
RAC: 0
Message 45868 - Posted: 29 Jan 2011, 3:15:39 UTC

I've recently started N-body work on a CPU, after upgrading from an optimized ap. I'm seeing lots of computing errors as well, some after a long time crunching. Anyone have any ideas?

http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=226730964
http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=226593122
http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=225091853
ID: 45868 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile XJR-Maniac
Avatar

Send message
Joined: 18 Oct 07
Posts: 35
Credit: 4,684,314
RAC: 0
Message 45881 - Posted: 30 Jan 2011, 12:42:46 UTC

Same issue for me on MW v0.50 with NVIDIA GTX 260. All tasks are running up to 100% and crash on finish line. Just updated driver to 266.58 on one machine with no avail. Checked my wingmen and it seems that MW v0.23 with ATI is crashing, too.

Here's the log:

<core_client_version>6.10.17</core_client_version>
<![CDATA[
<message>
Unzulässige Funktion. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
<search_application> milkywayathome separation 0.50 Windows x86 double OpenCL </search_application>
Found 1 platforms
Platform 0 information:
  Platform name:       NVIDIA CUDA
  Platform version:    OpenCL 1.0 CUDA 3.2.1
  Platform vendor:     
  Platform profile:    
  Platform extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll 
Using device 0 on platform 0
Found 2 CL devices
Device GeForce GTX 260 (NVIDIA Corporation:0x10de)
Type:                CL_DEVICE_TYPE_GPU
Driver version:      266.58
Version:             OpenCL 1.0 CUDA
Compute capability:  1.3
Little endian:       CL_TRUE
Error correction:    CL_FALSE
Image support:       CL_TRUE
Address bits:        32
Max compute units:   27
Clock frequency:     1104 Mhz
Global mem size:     939327488
Max mem alloc:       234831872
Global mem cache:    0
Cacheline size:      0
Local mem type:      CL_LOCAL
Local mem size:      16384
Max const args:      9
Max const buf size:  65536
Max parameter size:  4352
Max work group size: 512
Max work item dim:   3
Max work item sizes: { 512, 512, 64 }
Mem base addr align: 2048
Min type align size: 128
Timer resolution:    1000 ns
Double extension:    MW_CL_KHR_FP64
Extensions:          cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll  cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 
Found a compute capability 1.3 device. Using -cl-nv-maxrregcount=32 

Compiler flags:
-cl-mad-enable -cl-no-signed-zeros -cl-strict-aliasing -cl-finite-math-only -DUSE_CL_MATH_TYPES=0 -DUSE_MAD=1 -DUSE_FMA=0 -cl-nv-verbose  -cl-nv-maxrregcount=32  -DDOUBLEPREC=1 -DMILKYWAY_MATH_COMPILATION -DNSTREAM=3 -DFAST_H_PROB=1 -DAUX_BG_PROFILE=0 -DUSE_IMAGES=1 -DI_DONT_KNOW_WHY_THIS_DOESNT_WORK_HERE=0  

Build status: CL_BUILD_SUCCESS
Build log: 

: Considering profile 'compute_13' for gpu='sm_13' in 'cuModuleLoadDataEx_4'
: Retrieving binary for 'cuModuleLoadDataEx_4', for gpu='sm_13', usage mode='  --verbose --maxrregcount 32  '
: Considering profile 'compute_13' for gpu='sm_13' in 'cuModuleLoadDataEx_4'
: Control flags for 'cuModuleLoadDataEx_4' disable search path
: Ptx binary found for 'cuModuleLoadDataEx_4', architecture='compute_13'
: Ptx compilation for 'cuModuleLoadDataEx_4', for gpu='sm_13', ocg options='  --verbose --maxrregcount 32  '
ptxas info    : Compiling entry function 'mu_sum_kernel' for 'sm_13'
ptxas info    : Used 32 registers, 800+0 bytes lmem, 48+16 bytes smem, 56 bytes cmem[1], 4 bytes cmem[2], 4 bytes cmem[3], 4 bytes cmem[4], 4 bytes cmem[5], 4 bytes cmem[6]
Kernel work group info:
  Work group size = 512
  Kernel local mem size = 64
  Compile work group size = { 0, 0, 0 }
Group size = 64, per CU = 8, threads per CU = 512
Block size = 13824
Desired = 163
Min sol: 163 13312
Lower n solution: n = 163, x = 13312
Higher n solution: n = 163, x = 13312
Using solution: n = 163, x = 13312
Range:          { nu_steps = 640, mu_steps = 1600, r_steps = 1400 }
Iteration area: 2240000
Chunk estimate: 163
Num chunks:     163
Added area:     13312
Effective area: 2253312
Integration time: 957.124801 s. Average time per iteration = 1495.507502 ms
Kernel work group info:
  Work group size = 512
  Kernel local mem size = 64
  Compile work group size = { 0, 0, 0 }
Group size = 64, per CU = 8, threads per CU = 512
Block size = 13824
Desired = 21
Min sol: 1 0
Min sol: 1 0
Min sol: 1 0
Min sol: 1 0
Min sol: 1 0
Min sol: 1 0
Min sol: 1 0
Min sol: 1 0
Min sol: 1 0
Min sol: 1 0
Min sol: 1 0
Min sol: 1 0
Min sol: 1 0
Min sol: 1 0
Min sol: 1 0
Min sol: 1 0
Min sol: 1 0
Min sol: 1 0
Min sol: 1 0
Min sol: 1 0
Didn't find a solution. Using fallback solution n = 20, x = 0
Using solution: n = 20, x = 0
Range:          { nu_steps = 160, mu_steps = 400, r_steps = 700 }
Iteration area: 280000
Chunk estimate: 21
Num chunks:     20
Added area:     0
Effective area: 280000
Global dimensions not divisible by local
Failed to find good run sizes
Failed to calculate integral 1
12:30:48 (2372): called boinc_finish

</stderr_txt>
]]>


Couldn't be a BOINC client version issue because the ATIs are crashing on 6.10.56 while I'm using 6.10.17.

Will set my boxes to NNW until further notice.

ID: 45881 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Len LE/GE

Send message
Joined: 8 Feb 08
Posts: 261
Credit: 104,050,322
RAC: 0
Message 45883 - Posted: 30 Jan 2011, 14:01:45 UTC

WU: de_separation_23_3s
GPU: NVIDIA GTX 260

Looks like the problem with GTX2xx and WUs de_separation_23_3s Matt is aware of and going to fix in the next version.
Other WUs should run on your GPU.
The only WU with error I still could find in your list is Workunit 228247254 and there is an ATI card (HD5850?) that finished the WU and is waiting for validation (not chrashed).
ID: 45883 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile XJR-Maniac
Avatar

Send message
Joined: 18 Oct 07
Posts: 35
Credit: 4,684,314
RAC: 0
Message 45886 - Posted: 30 Jan 2011, 17:27:37 UTC - in response to Message 45883.  

Yes, you're right. They all have been de_separation_23_3s WUs, the last one crashed right now. So I will wait for the fix to come. In the meantime, there are other projects waiting for my GPUs ;-)))


WU: de_separation_23_3s
GPU: NVIDIA GTX 260

Looks like the problem with GTX2xx and WUs de_separation_23_3s Matt is aware of and going to fix in the next version.
Other WUs should run on your GPU.
The only WU with error I still could find in your list is Workunit 228247254 and there is an ATI card (HD5850?) that finished the WU and is waiting for validation (not chrashed).


ID: 45886 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Error while computing again and again

©2024 Astroinformatics Group