Welcome to MilkyWay@home

testing work generation with 'ps_separation_14_2s_null_3'

Message boards : News : testing work generation with 'ps_separation_14_2s_null_3'
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 54603 - Posted: 1 Jun 2012, 23:48:28 UTC - in response to Message 54602.  

GPU-only tasks tested ...

PC #A
Just completed a ps_separation_09 task, OK
All else fails with computation eror at 100% completion:- ps_separation_14_2s_null_3_v2, v3, v4, ps_separation_14_2s_05_03
I have restarted, cold booted and detached/reattached. Same problem as above.
Result:-
Will suspend project and abort existing ps_separation_14 tasks until a fix is in place.

PC #B (only now switched it on after a couple of days, so no recent tasks have been loaded yet)
Completing ps_separation_09 tasks (7 of them so far), OK
Result:-
I've switched to "No new tasks" for now.

For those with headless/unattended servers, you're going to either be busy for a while or else waste a lot of electricity doing nothing until a fix is found.


Very strange, it looks like they ran successfully. I have no clue why the client would have marked them as errors. Send Matt A. a message so hopefully he'll know what the issue is.
ID: 54603 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Ray_GTI-R
Avatar

Send message
Joined: 5 Nov 10
Posts: 69
Credit: 15,064,831
RAC: 0
Message 54604 - Posted: 2 Jun 2012, 0:22:02 UTC - in response to Message 54603.  
Last modified: 2 Jun 2012, 0:57:18 UTC

To be clear:-

PC #A is 446288 (all but one old WU consistently fail, post-e.g., 1st June, 22:00 UTC).
PC #B is 231173 (all old tasks complete OK, no new tasks for days as switched off).

Can you point me to where you see PC #A (446288, post- 1st June, about 22:00 UTC) succeed processing any but that one [old] GPU task?
Apologies if I've got any of this wrong. I'm not an expert, just a cruncher.
ID: 54604 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Robert Gammon

Send message
Joined: 29 Nov 10
Posts: 4
Credit: 4,783,425
RAC: 0
Message 54605 - Posted: 2 Jun 2012, 0:27:10 UTC - in response to Message 54597.  

Just had 6 wus abort with computation error, all were ps separation null3 v4 units.

This machine has had zero problems with any wus prior to today
ID: 54605 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Dataman
Avatar

Send message
Joined: 5 Sep 08
Posts: 28
Credit: 245,585,043
RAC: 0
Message 54607 - Posted: 2 Jun 2012, 13:32:19 UTC

Same here ... all errored out. I will leave a couple of cards here in case you need some testers. Good luck!

ID: 54607 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile ^..^~~

Send message
Joined: 22 Oct 11
Posts: 23
Credit: 71,023,220
RAC: 0
Message 54608 - Posted: 2 Jun 2012, 14:44:40 UTC

My systems are not able to get new work units to even check and see if and when things are fixed.
^..^~~
ID: 54608 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Angelika

Send message
Joined: 25 May 11
Posts: 3
Credit: 299,588
RAC: 0
Message 54609 - Posted: 2 Jun 2012, 15:54:26 UTC

all 4 WUs with errors - cannot get new tasks
ID: 54609 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile ^..^~~

Send message
Joined: 22 Oct 11
Posts: 23
Credit: 71,023,220
RAC: 0
Message 54610 - Posted: 2 Jun 2012, 16:23:56 UTC

I'd say "the pooch got screwed" with this update! Ha!
Too funny!

^..^~~
ID: 54610 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Dataman
Avatar

Send message
Joined: 5 Sep 08
Posts: 28
Credit: 245,585,043
RAC: 0
Message 54611 - Posted: 2 Jun 2012, 17:10:11 UTC

Just a personal request but if you can turn on that which is necessary for us to upload completed wu's and errors it would be most helpful. I was running quite a number of cards when the problems occured. When I look at my master console with BoincTasks, all those red lines give me heart palpatations. Hahaha

No worries if that is not feasible. Have a good weekend.

ID: 54611 · Rating: 0 · rate: Rate + / Rate - Report as offensive
BarryAZ

Send message
Joined: 1 Sep 08
Posts: 520
Credit: 302,524,931
RAC: 15
Message 54613 - Posted: 2 Jun 2012, 17:14:35 UTC

I just did the suspend thing. Once we 'return to the future' here I'll likely do product resets or detach and rejoins just to clear out the cobwebs.

Perhaps some information might be useful though.
ID: 54613 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 54615 - Posted: 2 Jun 2012, 17:54:07 UTC - in response to Message 54603.  

GPU-only tasks tested ...

PC #A
Just completed a ps_separation_09 task, OK
All else fails with computation eror at 100% completion:- ps_separation_14_2s_null_3_v2, v3, v4, ps_separation_14_2s_05_03
I have restarted, cold booted and detached/reattached. Same problem as above.
Result:-
Will suspend project and abort existing ps_separation_14 tasks until a fix is in place.

PC #B (only now switched it on after a couple of days, so no recent tasks have been loaded yet)
Completing ps_separation_09 tasks (7 of them so far), OK
Result:-
I've switched to "No new tasks" for now.

For those with headless/unattended servers, you're going to either be busy for a while or else waste a lot of electricity doing nothing until a fix is found.


Very strange, it looks like they ran successfully. I have no clue why the client would have marked them as errors. Send Matt A. a message so hopefully he'll know what the issue is.


So Matt A. says this is most likely a problem with using an older version of the ATI application.

ID: 54615 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 54616 - Posted: 2 Jun 2012, 17:54:49 UTC - in response to Message 54611.  

Just a personal request but if you can turn on that which is necessary for us to upload completed wu's and errors it would be most helpful. I was running quite a number of cards when the problems occured. When I look at my master console with BoincTasks, all those red lines give me heart palpatations. Hahaha

No worries if that is not feasible. Have a good weekend.


Thinks are back on, hopefully I get some more information about why some clients are erroring out on the workunits.
ID: 54616 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Sunny129
Avatar

Send message
Joined: 25 Jan 11
Posts: 271
Credit: 346,072,284
RAC: 0
Message 54618 - Posted: 2 Jun 2012, 18:26:42 UTC

i've run ~10 tasks since the server went back up, and only one errored out (a null_3 task). my first de_separation task crunched without error though.
ID: 54618 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile tomast
Avatar

Send message
Joined: 9 May 12
Posts: 12
Credit: 10,339,447
RAC: 0
Message 54619 - Posted: 2 Jun 2012, 18:38:48 UTC
Last modified: 2 Jun 2012, 18:58:41 UTC

We have run over 80 on AMD GPU today (No errors)
At least a hundred or more went thru yesterday and the only errors
were a couple _v2 that sneaked in. ;-)

Even ran 8 thru my much slower nvidia GPU. (No errors)
ID: 54619 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile tomast
Avatar

Send message
Joined: 9 May 12
Posts: 12
Credit: 10,339,447
RAC: 0
Message 54620 - Posted: 2 Jun 2012, 20:13:38 UTC
Last modified: 2 Jun 2012, 20:27:10 UTC

Going good today but... another _v2 just slipped through
and as before (Computation error) right at the end. [stars file]
Too bad they don't error at the start...
Mine only took (40.35) GPU , but wingman took (28,298.82)on CPU. Ouch !
http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=178264077
Can't these _v2 be weeded out rather than just sending more copies ?
ID: 54620 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 54621 - Posted: 2 Jun 2012, 20:30:05 UTC - in response to Message 54620.  


Can't these _v2 be weeded out rather than just sending more copies ?


Done, they shouldn't be getting sent out anymore.
ID: 54621 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Ray_GTI-R
Avatar

Send message
Joined: 5 Nov 10
Posts: 69
Credit: 15,064,831
RAC: 0
Message 54622 - Posted: 2 Jun 2012, 20:36:00 UTC

Nope:- still erroring, details below ...

Task 225600128
Ray_GTI-R | log out
Name ps_separation_14_2s_null_3_v4_1338660883_41090_1
Workunit 178349109
Created 2 Jun 2012 | 20:15:08 UTC
Sent 2 Jun 2012 | 20:15:53 UTC
Received 2 Jun 2012 | 20:32:07 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 1 (0x1) Unknown error number
Computer ID 231173
Report deadline 14 Jun 2012 | 20:15:53 UTC
Run time 448.03
CPU time 4.86
Validate state Invalid
Credit 0.00
Application version MilkyWay@Home v0.82 (ati14)

Stderr output
<core_client_version>6.12.34</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
Using SSE2 path
Found 1 CAL devices
Chose device 0

Device target: CAL_TARGET_670
Revision: 41
CAL Version: 1.4.1523
Engine clock: 668 Mhz
Memory clock: 828 Mhz
GPU RAM: 512
Wavefront size: 64
Double precision: CAL_TRUE
Compute shader: CAL_FALSE
Number SIMD: 4
Number shader engines: 1
Pitch alignment: 256
Surface alignment: 256
Max size 2D: { 8192, 8192 }

Estimated iteration time 660.343376 ms
Target frequency 30.000000 Hz, polling mode 1
Dividing into 19 chunks, initially sleeping for 0 ms
Integration range: { nu_steps = 640, mu_steps = 1600, r_steps = 1400 }
Using 19 chunk(s) with sizes: 80 80 80 96 80 80 80 96 80 80 80 96 80 80 96 80 80 80 96
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Integration time = 442.362278 s, average per iteration = 691.191059 ms
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Integral 0 time = 446.066926 s
Failed to calculate integral 0
21:31:13 (2680): called boinc_finish

</stderr_txt>
]]>


--------------------------------------------------------------------------------
ID: 54622 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 54623 - Posted: 2 Jun 2012, 20:41:30 UTC - in response to Message 54622.  

Nope:- still erroring, details below ...

Task 225600128
Ray_GTI-R | log out
Name ps_separation_14_2s_null_3_v4_1338660883_41090_1
Workunit 178349109
Created 2 Jun 2012 | 20:15:08 UTC
Sent 2 Jun 2012 | 20:15:53 UTC
Received 2 Jun 2012 | 20:32:07 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 1 (0x1) Unknown error number
Computer ID 231173
Report deadline 14 Jun 2012 | 20:15:53 UTC
Run time 448.03
CPU time 4.86
Validate state Invalid
Credit 0.00
Application version MilkyWay@Home v0.82 (ati14)

Stderr output
6.12.34

Incorrect function. (0x1) - exit code 1 (0x1)


Using SSE2 path
Found 1 CAL devices
Chose device 0

Device target: CAL_TARGET_670
Revision: 41
CAL Version: 1.4.1523
Engine clock: 668 Mhz
Memory clock: 828 Mhz
GPU RAM: 512
Wavefront size: 64
Double precision: CAL_TRUE
Compute shader: CAL_FALSE
Number SIMD: 4
Number shader engines: 1
Pitch alignment: 256
Surface alignment: 256
Max size 2D: { 8192, 8192 }

Estimated iteration time 660.343376 ms
Target frequency 30.000000 Hz, polling mode 1
Dividing into 19 chunks, initially sleeping for 0 ms
Integration range: { nu_steps = 640, mu_steps = 1600, r_steps = 1400 }
Using 19 chunk(s) with sizes: 80 80 80 96 80 80 80 96 80 80 80 96 80 80 96 80 80 80 96
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Integration time = 442.362278 s, average per iteration = 691.191059 ms
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Integral 0 time = 446.066926 s
Failed to calculate integral 0
21:31:13 (2680): called boinc_finish


]]>


--------------------------------------------------------------------------------


Matt A. says it's because of the old version of your ATI application.
ID: 54623 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 578
Credit: 18,845,143
RAC: 932
Message 54624 - Posted: 2 Jun 2012, 20:50:54 UTC - in response to Message 54623.  
Last modified: 2 Jun 2012, 20:56:07 UTC

Matt A. says it's because of the old version of your ATI application.

Does that basically mean, that the new tasks are incompatible with HD38x0 cards?
ID: 54624 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 578
Credit: 18,845,143
RAC: 932
Message 54629 - Posted: 2 Jun 2012, 22:02:09 UTC - in response to Message 54624.  
Last modified: 2 Jun 2012, 22:09:55 UTC

OK, they all error out, however I get as stderr that:

<core_client_version>6.12.34</core_client_version>
<![CDATA[
<message>
- exit code -1073740940 (0xc0000374)
</message>
<stderr_txt>
Using SSE3 path
Found 1 CAL devices
Chose device 0

Device target: CAL_TARGET_670
Revision: 41
CAL Version: 1.4.1546
Engine clock: 720 Mhz
Memory clock: 900 Mhz
GPU RAM: 512
Wavefront size: 64
Double precision: CAL_TRUE
Compute shader: CAL_FALSE
Number SIMD: 4
Number shader engines: 1
Pitch alignment: 256
Surface alignment: 4096
Max size 2D: { 8192, 8192 }

Estimated iteration time 612.651910 ms
Target frequency 120.000000 Hz, polling mode 4
Dividing into 73 chunks, initially sleeping for 0 ms
Integration range: { nu_steps = 640, mu_steps = 1600, r_steps = 1400 }
Using 73 chunk(s) with sizes: 16 16 32 16 16 32 16 32 16 16 32 16 16 32 16 16 32 16 32 16 16 32 16 16 32 16 32 16 16 32 16 16 32 16 32 16 16 32 16 16 32 16 16 32 16 32 16 16 32 16 16 32 16 32 16 16 32 16 16 32 16 16 32 16 32 16 16 32 16 16 32 16 32
Integration time = 817.866787 s, average per iteration = 1277.916855 ms
Integral 0 time = 819.342291 s
Likelihood time = 2.521662 s
<background_integral> 0.000697824013171 </background_integral>
<stream_integral> 1358.169746558015000 423.166163591165340 </stream_integral>
<background_likelihood> -2.988655550325422 </background_likelihood>
<stream_only_likelihood> -7.295573957091166 -9.251895081428136 </stream_only_likelihood>
<search_likelihood> -2.988610218899344 </search_likelihood>

</stderr_txt>
]]>

so it looks quite normal, only

<search_application> milkywayathome_client separation 0.82 Windows x86_64 double CAL++ </search_application>
08:59:27 (4784): called boinc_finish

is missing after "</search_likelihood>".

EDIT: OK, I suspended the rest of those tasks for now and will crunch Collatz again until you (hopefully) fix that. Is generating few more of those errors with the rest of WUs, that I have here, of any use for you to find the problem, or shall I just abort them?
ID: 54629 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Ray_GTI-R
Avatar

Send message
Joined: 5 Nov 10
Posts: 69
Credit: 15,064,831
RAC: 0
Message 54630 - Posted: 2 Jun 2012, 22:23:18 UTC - in response to Message 54623.  
Last modified: 2 Jun 2012, 22:24:42 UTC

To recap:- on my PCs ...
The current batch(es) of GPU WUs fail.
The previous batches ran sucessfully.
Others report the same issue.

There has been no ATI application/driver change at this end.

I thought the only change was to create new work i.e.,
I'm testing work generation right now, there should be workunits available to download now. Let me know how these workunits are crunching!

Matt A. says it's because of the old version of your ATI application.

Has there been an unannounced update to MW@H crunching requirement(s)?
ID: 54630 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Previous · 1 · 2 · 3 · 4 · Next

Message boards : News : testing work generation with 'ps_separation_14_2s_null_3'

©2024 Astroinformatics Group