Welcome to MilkyWay@home

Milkyway@Home Separation (Modified Fit) v1.28 (opencl_nvidia) crashes on Titan

Message boards : Number crunching : Milkyway@Home Separation (Modified Fit) v1.28 (opencl_nvidia) crashes on Titan
Message board moderation

To post messages, you must log in.

AuthorMessage
_heinz

Send message
Joined: 23 Feb 09
Posts: 28
Credit: 10,775,220
RAC: 0
Message 61514 - Posted: 15 Apr 2014, 13:10:01 UTC

I use [3] NVIDIA GeForce GTX TITAN (4095MB) driver: 335.23 OpenCL: 1.01
and the app crashes continous for every wu.
Have a look here:
my wus
my hostid=571719
Any idea what I can do ?
Or a bug ?

Regards heinz
ID: 61514 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
swiftmallard
Avatar

Send message
Joined: 18 Jul 09
Posts: 300
Credit: 303,565,482
RAC: 0
Message 61515 - Posted: 15 Apr 2014, 13:28:18 UTC

I'm not an nvidia guy but this may be a driver issue. Do not lose heart, one of the experienced nvidia crunchers will post here soon.
ID: 61515 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Len LE/GE

Send message
Joined: 8 Feb 08
Posts: 261
Credit: 104,050,322
RAC: 0
Message 61517 - Posted: 16 Apr 2014, 0:14:57 UTC

Failed to move file 'separation_checkpoint_tmp' to 'separation_checkpoint' (15100): (null)
Failed to move file 'separation_checkpoint_tmp' to 'separation_checkpoint' (15105): (null)
Failed to move file 'separation_checkpoint_tmp' to 'separation_checkpoint' (15105): (null)
Failed to move file 'separation_checkpoint_tmp' to 'separation_checkpoint' (15105): (null)
Failed to move file 'separation_checkpoint_tmp' to 'separation_checkpoint' (15105): (null)
Failed to move file 'separation_checkpoint_tmp' to 'separation_checkpoint' (15105): (null)
Failed to move file 'separation_checkpoint_tmp' to 'separation_checkpoint' (15105): (null)
Failed to update checkpoint file ('separation_checkpoint_tmp' to 'separation_checkpoint') (34): Result too large
Integration time: 57.212246 s. Average time per iteration = 89.394134 ms
Integral 0 time = 63.559289 s
Failed to calculate integral 0
Failed to calculate likelihood
<background_integral> 1.#QNAN0000000000 </background_integral>
<stream_integral> 1.#QNAN0000000000 1.#QNAN0000000000 1.#QNAN0000000000 </stream_integral>
<background_likelihood> 1.#QNAN0000000000 </background_likelihood>
<stream_only_likelihood> 1.#QNAN0000000000 1.#QNAN0000000000 1.#QNAN0000000000 </stream_only_likelihood>
<search_likelihood> 1.#QNAN0000000000 </search_likelihood>

Not a NVIDIA guy either, but the checkpoint errors make me wonder. Why can't the prog write the checkpoint and why does it even want to write a checkpoint with only 1 minute runtime?
ID: 61517 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3339
Credit: 524,010,781
RAC: 0
Message 61518 - Posted: 16 Apr 2014, 10:54:40 UTC - in response to Message 61514.  

I use [3] NVIDIA GeForce GTX TITAN (4095MB) driver: 335.23 OpenCL: 1.01
and the app crashes continous for every wu.
Have a look here:
my wus
my hostid=571719
Any idea what I can do ?
Or a bug ?

Regards heinz


I can't find where I read it but I read somewhere that you need at least the 341.xx driver to crunch the new cuda60 units. You are running version 335.23, you might try upgrading and see if that helps.

As far as checkpointing every 60 seconds that is the Boinc default, alot of people change that to 900 seconds, 15 minutes, to save their harddrives, but that is a manual change.
ID: 61518 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile arkayn
Avatar

Send message
Joined: 14 Feb 09
Posts: 999
Credit: 74,932,619
RAC: 0
Message 61519 - Posted: 16 Apr 2014, 16:34:05 UTC - in response to Message 61518.  

I use [3] NVIDIA GeForce GTX TITAN (4095MB) driver: 335.23 OpenCL: 1.01
and the app crashes continous for every wu.
Have a look here:
my wus
my hostid=571719
Any idea what I can do ?
Or a bug ?

Regards heinz


I can't find where I read it but I read somewhere that you need at least the 341.xx driver to crunch the new cuda60 units. You are running version 335.23, you might try upgrading and see if that helps.

As far as checkpointing every 60 seconds that is the Boinc default, alot of people change that to 900 seconds, 15 minutes, to save their harddrives, but that is a manual change.


The most recent beta driver from Nvidia is 337.50, CUDA 6 has been available since 331.xx.
ID: 61519 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3339
Credit: 524,010,781
RAC: 0
Message 61527 - Posted: 17 Apr 2014, 11:47:13 UTC - in response to Message 61519.  

I use [3] NVIDIA GeForce GTX TITAN (4095MB) driver: 335.23 OpenCL: 1.01
and the app crashes continous for every wu.
Have a look here:
my wus
my hostid=571719
Any idea what I can do ?
Or a bug ?

Regards heinz


I can't find where I read it but I read somewhere that you need at least the 341.xx driver to crunch the new cuda60 units. You are running version 335.23, you might try upgrading and see if that helps.

As far as checkpointing every 60 seconds that is the Boinc default, alot of people change that to 900 seconds, 15 minutes, to save their harddrives, but that is a manual change.


The most recent beta driver from Nvidia is 337.50, CUDA 6 has been available since 331.xx.


I must be confused then, sorry!
ID: 61527 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Milkyway@Home Separation (Modified Fit) v1.28 (opencl_nvidia) crashes on Titan

©2024 Astroinformatics Group