Welcome to MilkyWay@home

Nvidia GPU tasks crashing after 2 seconds


Advanced search

Questions and Answers : Windows : Nvidia GPU tasks crashing after 2 seconds
Message board moderation

To post messages, you must log in.

AuthorMessage
[AF>Libristes] Kao

Send message
Joined: 2 Oct 16
Posts: 2
Credit: 1,815,699
RAC: 0
1 million credit badge4 year member badge
Message 69617 - Posted: 23 Mar 2020, 9:54:04 UTC

Hi,
I've recently tried to crunch a bit for Milkyway but encountered only issues. I got a lot of tasks, but except the 8CPU ones, they all failed.
To be more precise, all the Nvidia tasks failed with the following error :
<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
Les caract�res g�n�riques (* ou�?) ont �t� sp�cifi�s de mani�re incorrecte ou en trop grand nombre.
 (0xd0) - exit code 208 (0xd0)</message>
<stderr_txt>
<search_application> milkyway_separation 1.46 Windows x86 double OpenCL </search_application>
Reading preferences ended prematurely
BOINC GPU type suggests using OpenCL vendor 'NVIDIA Corporation'
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4' 
Switching to Parameter File 'astronomy_parameters.txt'
<number_WUs> 5 </number_WUs>
<number_params_per_WU> 20 </number_params_per_WU>
Using SSE4.1 path
Found 1 platform
Platform 0 information:
  Name:       NVIDIA CUDA
  Version:    OpenCL 1.2 CUDA 10.2.150
  Vendor:     NVIDIA Corporation
  Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics
  Profile:    FULL_PROFILE
Using device 0 on platform 0
Found 1 CL device
Device 'GeForce GTX 980M' (NVIDIA Corporation:0x10de) (CL_DEVICE_TYPE_GPU)
Board: 
Driver version:      442.74
Version:             OpenCL 1.2 CUDA
Compute capability:  5.2
Max compute units:   12
Clock frequency:     1126 Mhz
Global mem size:     4294967296
Local mem size:      49152
Max const buf size:  65536
Double extension:    cl_khr_fp64
Error creating contextTrying to show unknown cl_int 208
 (208): Unknown cl_int
Error getting device and contextTrying to show unknown cl_int 208
 (208): Unknown cl_int
Failed to calculate likelihood
Using SSE4.1 path
Found 1 platform
Platform 0 information:
  Name:       NVIDIA CUDA
  Version:    OpenCL 1.2 CUDA 10.2.150
  Vendor:     NVIDIA Corporation
  Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics
  Profile:    FULL_PROFILE
Using device 0 on platform 0
Found 1 CL device
Device 'GeForce GTX 980M' (NVIDIA Corporation:0x10de) (CL_DEVICE_TYPE_GPU)
Board: 
Driver version:      442.74
Version:             OpenCL 1.2 CUDA
Compute capability:  5.2
Max compute units:   12
Clock frequency:     1126 Mhz
Global mem size:     4294967296
Local mem size:      49152
Max const buf size:  65536
Double extension:    cl_khr_fp64
Error creating contextTrying to show unknown cl_int 208
 (208): Unknown cl_int
Error getting device and contextTrying to show unknown cl_int 208
 (208): Unknown cl_int
Failed to calculate likelihood
Using SSE4.1 path
Found 1 platform
Platform 0 information:
  Name:       NVIDIA CUDA
  Version:    OpenCL 1.2 CUDA 10.2.150
  Vendor:     NVIDIA Corporation
  Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics
  Profile:    FULL_PROFILE
Using device 0 on platform 0
Found 1 CL device
Device 'GeForce GTX 980M' (NVIDIA Corporation:0x10de) (CL_DEVICE_TYPE_GPU)
Board: 
Driver version:      442.74
Version:             OpenCL 1.2 CUDA
Compute capability:  5.2
Max compute units:   12
Clock frequency:     1126 Mhz
Global mem size:     4294967296
Local mem size:      49152
Max const buf size:  65536
Double extension:    cl_khr_fp64
Error creating contextTrying to show unknown cl_int 208
 (208): Unknown cl_int
Error getting device and contextTrying to show unknown cl_int 208
 (208): Unknown cl_int
Failed to calculate likelihood
Using SSE4.1 path
Found 1 platform
Platform 0 information:
  Name:       NVIDIA CUDA
  Version:    OpenCL 1.2 CUDA 10.2.150
  Vendor:     NVIDIA Corporation
  Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics
  Profile:    FULL_PROFILE
Using device 0 on platform 0
Found 1 CL device
Device 'GeForce GTX 980M' (NVIDIA Corporation:0x10de) (CL_DEVICE_TYPE_GPU)
Board: 
Driver version:      442.74
Version:             OpenCL 1.2 CUDA
Compute capability:  5.2
Max compute units:   12
Clock frequency:     1126 Mhz
Global mem size:     4294967296
Local mem size:      49152
Max const buf size:  65536
Double extension:    cl_khr_fp64
Error creating contextTrying to show unknown cl_int 208
 (208): Unknown cl_int
Error getting device and contextTrying to show unknown cl_int 208
 (208): Unknown cl_int
Failed to calculate likelihood
Using SSE4.1 path
Found 1 platform
Platform 0 information:
  Name:       NVIDIA CUDA
  Version:    OpenCL 1.2 CUDA 10.2.150
  Vendor:     NVIDIA Corporation
  Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics
  Profile:    FULL_PROFILE
Using device 0 on platform 0
Found 1 CL device
Device 'GeForce GTX 980M' (NVIDIA Corporation:0x10de) (CL_DEVICE_TYPE_GPU)
Board: 
Driver version:      442.74
Version:             OpenCL 1.2 CUDA
Compute capability:  5.2
Max compute units:   12
Clock frequency:     1126 Mhz
Global mem size:     4294967296
Local mem size:      49152
Max const buf size:  65536
Double extension:    cl_khr_fp64
Error creating contextTrying to show unknown cl_int 208
 (208): Unknown cl_int
Error getting device and contextTrying to show unknown cl_int 208
 (208): Unknown cl_int
Failed to calculate likelihood
09:00:56 (22696): called boinc_finish(208)

</stderr_txt>
]]>
ID: 69617 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bluestang

Send message
Joined: 13 Oct 16
Posts: 105
Credit: 1,040,180,905
RAC: 585,701
1 billion credit badge4 year member badge
Message 69618 - Posted: 23 Mar 2020, 15:56:19 UTC - in response to Message 69617.  
Last modified: 23 Mar 2020, 15:58:46 UTC

Does your 980M GPU have Double Precicion (FP64) compute so it will not work on this project?

Nevermind, I see it has some ~100 GFLOPS.
ID: 69618 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[AF>Libristes] Kao

Send message
Joined: 2 Oct 16
Posts: 2
Credit: 1,815,699
RAC: 0
1 million credit badge4 year member badge
Message 69619 - Posted: 23 Mar 2020, 22:13:19 UTC

I think it has something to do with CUDA/Driver version but I have no way to be sure...
ID: 69619 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilemikey
Avatar

Send message
Joined: 8 May 09
Posts: 2541
Credit: 462,666,679
RAC: 142
300 million credit badge12 year member badgeextraordinary contributions badge
Message 69620 - Posted: 24 Mar 2020, 0:12:19 UTC - in response to Message 69619.  

I think it has something to do with CUDA/Driver version but I have no way to be sure...


Roll it back to an older version and see if that works, I would go back a couple of versions. If the Server doesn't know about the driver version the units will crash, MW is not the fastest to update things.
ID: 69620 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Holdolin

Send message
Joined: 9 Dec 11
Posts: 33
Credit: 1,041,621,794
RAC: 1
1 billion credit badge9 year member badge
Message 69621 - Posted: 24 Mar 2020, 3:49:40 UTC - in response to Message 69619.  

I think it has something to do with CUDA/Driver version but I have no way to be sure...

Well, seti@home has the same problem with the most recent drivers under Windows. I would roll the driver back. If memory serves from the seti project 436.x was the most recent driver that worked for crunching with Nvidia on Windows.
ID: 69621 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileKeith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 451
Credit: 345,951,480
RAC: 422,960
300 million credit badge10 year member badgeextraordinary contributions badge
Message 69624 - Posted: 25 Mar 2020, 16:08:59 UTC

There was a problem with the most recent Nvidia drivers. Then we got Nvidia to fix them. Then Nvidia again released new drivers that don't have the previous required fix apparently.

Rollback.
ID: 69624 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Hurr1cane78

Send message
Joined: 7 May 14
Posts: 30
Credit: 51,502,561
RAC: 3
50 million credit badge7 year member badge
Message 69798 - Posted: 10 May 2020, 8:47:46 UTC

hi all made vid on youtube for multiple instances instruction's and at full load on a Radeon VII
RADEON VII GIGABYTE// 3 Instances_ Milkyway@home WUs BOINC_ 3_instances
https://www.youtube.com/watch?v=4xKy9wGKmz4
all the best and welcome to earth
ID: 69798 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilemikey
Avatar

Send message
Joined: 8 May 09
Posts: 2541
Credit: 462,666,679
RAC: 142
300 million credit badge12 year member badgeextraordinary contributions badge
Message 69802 - Posted: 10 May 2020, 16:47:26 UTC - in response to Message 69798.  
Last modified: 10 May 2020, 16:48:21 UTC

hi all made vid on youtube for multiple instances instruction's and at full load on a Radeon VII
RADEON VII GIGABYTE// 3 Instances_ Milkyway@home WUs BOINC_ 3_instances
https://www.youtube.com/watch?v=4xKy9wGKmz4
all the best and welcome to earth


SPAM
ID: 69802 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Questions and Answers : Windows : Nvidia GPU tasks crashing after 2 seconds

©2021 Astroinformatics Group