testing work generation with 'ps_separation_14_2s_null_3'
log in

Advanced search

Message boards : News : testing work generation with 'ps_separation_14_2s_null_3'

Author Message
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 54578 - Posted: 1 Jun 2012 | 18:00:30 UTC

I'm testing work generation right now, there should be workunits available to download now. Let me know how these workunits are crunching!

--Travis
____________

Phil
Send message
Joined: 29 Aug 10
Posts: 22
Credit: 902,842,906
RAC: 1,727,359
Message 54579 - Posted: 1 Jun 2012 | 18:01:49 UTC

I am getting computation errors on all WUs

Sebastian*
Send message
Joined: 8 Apr 09
Posts: 7
Credit: 1,397,794,362
RAC: 1,126,746
Message 54580 - Posted: 1 Jun 2012 | 18:02:43 UTC

Same here. They run like normal, but when they reach 100% they end with a Computation Error.

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 54581 - Posted: 1 Jun 2012 | 18:33:03 UTC - in response to Message 54579.

I am getting computation errors on all WUs


Think I fixed the problem, I generated a new batch of workunits, let me know how these are crunching.
____________

Profile tomast
Avatar
Send message
Joined: 9 May 12
Posts: 12
Credit: 10,339,447
RAC: 0
Message 54582 - Posted: 1 Jun 2012 | 18:41:14 UTC

Still getting computation errors
(Not had any errors before today.)

Incorrect function. (0x1) - exit code 1 (0x1)
Failed to read number of star points from file
(2): No such file or directory

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 54584 - Posted: 1 Jun 2012 | 19:01:14 UTC - in response to Message 54582.

Still getting computation errors
(Not had any errors before today.)

Incorrect function. (0x1) - exit code 1 (0x1)
Failed to read number of star points from file
(2): No such file or directory


Looks like Matt N. gave me a bad star file. Started up 'ps_separation_14_2s_null_3_v2', hopefully that will fix it.
____________

Jimmy Gondek
Send message
Joined: 28 Sep 11
Posts: 55
Credit: 10,558,354
RAC: 21,545
Message 54586 - Posted: 1 Jun 2012 | 19:17:33 UTC

...nope, nothing comin' out of the hose...you sure the water's turned on?...

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 54587 - Posted: 1 Jun 2012 | 19:27:04 UTC - in response to Message 54586.

...nope, nothing comin' out of the hose...you sure the water's turned on?...


Just made 500 more workunits from the new search.
____________

Profile tomast
Avatar
Send message
Joined: 9 May 12
Posts: 12
Credit: 10,339,447
RAC: 0
Message 54588 - Posted: 1 Jun 2012 | 19:42:00 UTC

_V2 still the same error right at the end of procesing.
http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=225454907

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 54589 - Posted: 1 Jun 2012 | 20:09:18 UTC - in response to Message 54588.

_V2 still the same error right at the end of procesing.
http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=225454907


I'm looking into this, seems like something weird is going on with the star files Matt N. gave me.
____________

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 54591 - Posted: 1 Jun 2012 | 20:21:33 UTC - in response to Message 54589.

Looks like _v2 might have been using the old wrong star file. I'm hoping v3 fixes that.
____________

Profile Ray_GTI-R
Avatar
Send message
Joined: 5 Nov 10
Posts: 56
Credit: 10,055,630
RAC: 39,138
Message 54593 - Posted: 1 Jun 2012 | 21:39:36 UTC - in response to Message 54591.

Same for me, Computer 427419.
Will credits be given for completed work that fail this way?
Thanks.

Sunny129
Avatar
Send message
Joined: 25 Jan 11
Posts: 249
Credit: 165,090,652
RAC: 523,887
Message 54594 - Posted: 1 Jun 2012 | 21:49:51 UTC

thank god others are having the same problem LOL. i've been pulling my hair out for the last hour trying to figure out why tasks are essentially running to completion and then erroring out at the last second...i feel much better now that i know its a server-side issue.
____________

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 54595 - Posted: 1 Jun 2012 | 21:53:45 UTC - in response to Message 54594.

thank god others are having the same problem LOL. i've been pulling my hair out for the last hour trying to figure out why tasks are essentially running to completion and then erroring out at the last second...i feel much better now that i know its a server-side issue.


From what I can tell, it looks like the newly generating 'ps_separation_14_2s_null_3_v3' workunits are crunching and validating, so I think we're in the clear from here on out.
____________

Sunny129
Avatar
Send message
Joined: 25 Jan 11
Posts: 249
Credit: 165,090,652
RAC: 523,887
Message 54596 - Posted: 1 Jun 2012 | 21:58:40 UTC - in response to Message 54595.

From what I can tell, it looks like the newly generating 'ps_separation_14_2s_null_3_v3' workunits are crunching and validating, so I think we're in the clear from here on out.

thanks for the update Travis. i wouldn't know yet, as i immediately suspended all MW@H work as soon as i saw WU's erroring out. now that i've just discovered the nature of the problem, i can resume crunching the remaining MW@H tasks in my queue (even though i know they'll error out). once those tasks have cleared my host, i can test the ps_separation_14_2s_null_3_v3 WU's and confirm whether or not the errors are gone...that is, if someone doesn't beat me to it.
____________

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 54597 - Posted: 1 Jun 2012 | 22:01:04 UTC - in response to Message 54596.

From what I can tell, it looks like the newly generating 'ps_separation_14_2s_null_3_v3' workunits are crunching and validating, so I think we're in the clear from here on out.

thanks for the update Travis. i wouldn't know yet, as i immediately suspended all MW@H work as soon as i saw WU's erroring out. now that i've just discovered the nature of the problem, i can resume crunching the remaining MW@H tasks in my queue (even though i know they'll error out). once those tasks have cleared my host, i can test the ps_separation_14_2s_null_3_v3 WU's and confirm whether or not the errors are gone...that is, if someone doesn't beat me to it.



We'll i've gotten back a bunch of successful ps_separation_14_2s_null_3_v3 results, so it's looking like here on out things will be good unless I screw something else up. I've actually been surprised at how smooth things have been going so far (considering it was a total reimplementation). I did a lot of offline testing but there's always kinks to work out when something like that goes live. Of course, I'm probably shooting myself in the foot by saying that, so expect incoming catastrophic errors. :P
____________

Sunny129
Avatar
Send message
Joined: 25 Jan 11
Posts: 249
Credit: 165,090,652
RAC: 523,887
Message 54598 - Posted: 1 Jun 2012 | 22:05:39 UTC

ok, the ps_separation_14_2s_null_3_v3 are crunching to completion without errors...so it seems all is well for the time being.
____________

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 54599 - Posted: 1 Jun 2012 | 22:09:53 UTC - in response to Message 54598.

I've also started a DE search: 'de_separation_14_2s_05_3'. It's using a different star file (but correctly formatted as far as I can tell), so let me know if those are crunching correctly as well.
____________

Profile tomast
Avatar
Send message
Joined: 9 May 12
Posts: 12
Credit: 10,339,447
RAC: 0
Message 54601 - Posted: 1 Jun 2012 | 22:50:13 UTC

So far so good ;-) mostly...

null_3_v4 --- Completed, validation inconclusive (all good so far)
05_3 --- Completed, validation inconclusive (all good so far)
sample_1 --- Completed and validated (all good so far)
null_3_v2 --- Computation error

Profile Ray_GTI-R
Avatar
Send message
Joined: 5 Nov 10
Posts: 56
Credit: 10,055,630
RAC: 39,138
Message 54602 - Posted: 1 Jun 2012 | 23:40:18 UTC - in response to Message 54601.

GPU-only tasks tested ...

PC #A
Just completed a ps_separation_09 task, OK
All else fails with computation eror at 100% completion:- ps_separation_14_2s_null_3_v2, v3, v4, ps_separation_14_2s_05_03
I have restarted, cold booted and detached/reattached. Same problem as above.
Result:-
Will suspend project and abort existing ps_separation_14 tasks until a fix is in place.

PC #B (only now switched it on after a couple of days, so no recent tasks have been loaded yet)
Completing ps_separation_09 tasks (7 of them so far), OK
Result:-
I've switched to "No new tasks" for now.

For those with headless/unattended servers, you're going to either be busy for a while or else waste a lot of electricity doing nothing until a fix is found.

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 54603 - Posted: 1 Jun 2012 | 23:48:28 UTC - in response to Message 54602.

GPU-only tasks tested ...

PC #A
Just completed a ps_separation_09 task, OK
All else fails with computation eror at 100% completion:- ps_separation_14_2s_null_3_v2, v3, v4, ps_separation_14_2s_05_03
I have restarted, cold booted and detached/reattached. Same problem as above.
Result:-
Will suspend project and abort existing ps_separation_14 tasks until a fix is in place.

PC #B (only now switched it on after a couple of days, so no recent tasks have been loaded yet)
Completing ps_separation_09 tasks (7 of them so far), OK
Result:-
I've switched to "No new tasks" for now.

For those with headless/unattended servers, you're going to either be busy for a while or else waste a lot of electricity doing nothing until a fix is found.


Very strange, it looks like they ran successfully. I have no clue why the client would have marked them as errors. Send Matt A. a message so hopefully he'll know what the issue is.
____________

Profile Ray_GTI-R
Avatar
Send message
Joined: 5 Nov 10
Posts: 56
Credit: 10,055,630
RAC: 39,138
Message 54604 - Posted: 2 Jun 2012 | 0:22:02 UTC - in response to Message 54603.
Last modified: 2 Jun 2012 | 0:57:18 UTC

To be clear:-

PC #A is 446288 (all but one old WU consistently fail, post-e.g., 1st June, 22:00 UTC).
PC #B is 231173 (all old tasks complete OK, no new tasks for days as switched off).

Can you point me to where you see PC #A (446288, post- 1st June, about 22:00 UTC) succeed processing any but that one [old] GPU task?
Apologies if I've got any of this wrong. I'm not an expert, just a cruncher.

Robert Gammon
Send message
Joined: 29 Nov 10
Posts: 3
Credit: 1,794,442
RAC: 344
Message 54605 - Posted: 2 Jun 2012 | 0:27:10 UTC - in response to Message 54597.

Just had 6 wus abort with computation error, all were ps separation null3 v4 units.

This machine has had zero problems with any wus prior to today

Dataman
Avatar
Send message
Joined: 5 Sep 08
Posts: 14
Credit: 100,034,466
RAC: 12
Message 54607 - Posted: 2 Jun 2012 | 13:32:19 UTC

Same here ... all errored out. I will leave a couple of cards here in case you need some testers. Good luck!
____________

Profile ^..^~~
Send message
Joined: 22 Oct 11
Posts: 23
Credit: 15,424,029
RAC: 16,699
Message 54608 - Posted: 2 Jun 2012 | 14:44:40 UTC

My systems are not able to get new work units to even check and see if and when things are fixed.
^..^~~

Angelika
Send message
Joined: 25 May 11
Posts: 3
Credit: 284,058
RAC: 14
Message 54609 - Posted: 2 Jun 2012 | 15:54:26 UTC

all 4 WUs with errors - cannot get new tasks

Profile ^..^~~
Send message
Joined: 22 Oct 11
Posts: 23
Credit: 15,424,029
RAC: 16,699
Message 54610 - Posted: 2 Jun 2012 | 16:23:56 UTC

I'd say "the pooch got screwed" with this update! Ha!
Too funny!

^..^~~

Dataman
Avatar
Send message
Joined: 5 Sep 08
Posts: 14
Credit: 100,034,466
RAC: 12
Message 54611 - Posted: 2 Jun 2012 | 17:10:11 UTC

Just a personal request but if you can turn on that which is necessary for us to upload completed wu's and errors it would be most helpful. I was running quite a number of cards when the problems occured. When I look at my master console with BoincTasks, all those red lines give me heart palpatations. Hahaha

No worries if that is not feasible. Have a good weekend.
____________

BarryAZ
Send message
Joined: 1 Sep 08
Posts: 512
Credit: 223,312,510
RAC: 162,328
Message 54613 - Posted: 2 Jun 2012 | 17:14:35 UTC

I just did the suspend thing. Once we 'return to the future' here I'll likely do product resets or detach and rejoins just to clear out the cobwebs.

Perhaps some information might be useful though.

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 54615 - Posted: 2 Jun 2012 | 17:54:07 UTC - in response to Message 54603.

GPU-only tasks tested ...

PC #A
Just completed a ps_separation_09 task, OK
All else fails with computation eror at 100% completion:- ps_separation_14_2s_null_3_v2, v3, v4, ps_separation_14_2s_05_03
I have restarted, cold booted and detached/reattached. Same problem as above.
Result:-
Will suspend project and abort existing ps_separation_14 tasks until a fix is in place.

PC #B (only now switched it on after a couple of days, so no recent tasks have been loaded yet)
Completing ps_separation_09 tasks (7 of them so far), OK
Result:-
I've switched to "No new tasks" for now.

For those with headless/unattended servers, you're going to either be busy for a while or else waste a lot of electricity doing nothing until a fix is found.


Very strange, it looks like they ran successfully. I have no clue why the client would have marked them as errors. Send Matt A. a message so hopefully he'll know what the issue is.


So Matt A. says this is most likely a problem with using an older version of the ATI application.

____________

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 54616 - Posted: 2 Jun 2012 | 17:54:49 UTC - in response to Message 54611.

Just a personal request but if you can turn on that which is necessary for us to upload completed wu's and errors it would be most helpful. I was running quite a number of cards when the problems occured. When I look at my master console with BoincTasks, all those red lines give me heart palpatations. Hahaha

No worries if that is not feasible. Have a good weekend.


Thinks are back on, hopefully I get some more information about why some clients are erroring out on the workunits.
____________

Sunny129
Avatar
Send message
Joined: 25 Jan 11
Posts: 249
Credit: 165,090,652
RAC: 523,887
Message 54618 - Posted: 2 Jun 2012 | 18:26:42 UTC

i've run ~10 tasks since the server went back up, and only one errored out (a null_3 task). my first de_separation task crunched without error though.
____________

Profile tomast
Avatar
Send message
Joined: 9 May 12
Posts: 12
Credit: 10,339,447
RAC: 0
Message 54619 - Posted: 2 Jun 2012 | 18:38:48 UTC
Last modified: 2 Jun 2012 | 18:58:41 UTC

We have run over 80 on AMD GPU today (No errors)
At least a hundred or more went thru yesterday and the only errors
were a couple _v2 that sneaked in. ;-)

Even ran 8 thru my much slower nvidia GPU. (No errors)

Profile tomast
Avatar
Send message
Joined: 9 May 12
Posts: 12
Credit: 10,339,447
RAC: 0
Message 54620 - Posted: 2 Jun 2012 | 20:13:38 UTC
Last modified: 2 Jun 2012 | 20:27:10 UTC

Going good today but... another _v2 just slipped through
and as before (Computation error) right at the end. [stars file]
Too bad they don't error at the start...
Mine only took (40.35) GPU , but wingman took (28,298.82)on CPU. Ouch !
http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=178264077
Can't these _v2 be weeded out rather than just sending more copies ?

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 54621 - Posted: 2 Jun 2012 | 20:30:05 UTC - in response to Message 54620.


Can't these _v2 be weeded out rather than just sending more copies ?


Done, they shouldn't be getting sent out anymore.
____________

Profile Ray_GTI-R
Avatar
Send message
Joined: 5 Nov 10
Posts: 56
Credit: 10,055,630
RAC: 39,138
Message 54622 - Posted: 2 Jun 2012 | 20:36:00 UTC

Nope:- still erroring, details below ...

Task 225600128
Ray_GTI-R | log out
Name ps_separation_14_2s_null_3_v4_1338660883_41090_1
Workunit 178349109
Created 2 Jun 2012 | 20:15:08 UTC
Sent 2 Jun 2012 | 20:15:53 UTC
Received 2 Jun 2012 | 20:32:07 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 1 (0x1) Unknown error number
Computer ID 231173
Report deadline 14 Jun 2012 | 20:15:53 UTC
Run time 448.03
CPU time 4.86
Validate state Invalid
Credit 0.00
Application version MilkyWay@Home v0.82 (ati14)

Stderr output
<core_client_version>6.12.34</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
Using SSE2 path
Found 1 CAL devices
Chose device 0

Device target: CAL_TARGET_670
Revision: 41
CAL Version: 1.4.1523
Engine clock: 668 Mhz
Memory clock: 828 Mhz
GPU RAM: 512
Wavefront size: 64
Double precision: CAL_TRUE
Compute shader: CAL_FALSE
Number SIMD: 4
Number shader engines: 1
Pitch alignment: 256
Surface alignment: 256
Max size 2D: { 8192, 8192 }

Estimated iteration time 660.343376 ms
Target frequency 30.000000 Hz, polling mode 1
Dividing into 19 chunks, initially sleeping for 0 ms
Integration range: { nu_steps = 640, mu_steps = 1600, r_steps = 1400 }
Using 19 chunk(s) with sizes: 80 80 80 96 80 80 80 96 80 80 80 96 80 80 96 80 80 80 96
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Integration time = 442.362278 s, average per iteration = 691.191059 ms
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Integral 0 time = 446.066926 s
Failed to calculate integral 0
21:31:13 (2680): called boinc_finish

</stderr_txt>
]]>


--------------------------------------------------------------------------------

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 54623 - Posted: 2 Jun 2012 | 20:41:30 UTC - in response to Message 54622.

Nope:- still erroring, details below ...

Task 225600128
Ray_GTI-R | log out
Name ps_separation_14_2s_null_3_v4_1338660883_41090_1
Workunit 178349109
Created 2 Jun 2012 | 20:15:08 UTC
Sent 2 Jun 2012 | 20:15:53 UTC
Received 2 Jun 2012 | 20:32:07 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 1 (0x1) Unknown error number
Computer ID 231173
Report deadline 14 Jun 2012 | 20:15:53 UTC
Run time 448.03
CPU time 4.86
Validate state Invalid
Credit 0.00
Application version MilkyWay@Home v0.82 (ati14)

Stderr output
<core_client_version>6.12.34</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
Using SSE2 path
Found 1 CAL devices
Chose device 0

Device target: CAL_TARGET_670
Revision: 41
CAL Version: 1.4.1523
Engine clock: 668 Mhz
Memory clock: 828 Mhz
GPU RAM: 512
Wavefront size: 64
Double precision: CAL_TRUE
Compute shader: CAL_FALSE
Number SIMD: 4
Number shader engines: 1
Pitch alignment: 256
Surface alignment: 256
Max size 2D: { 8192, 8192 }

Estimated iteration time 660.343376 ms
Target frequency 30.000000 Hz, polling mode 1
Dividing into 19 chunks, initially sleeping for 0 ms
Integration range: { nu_steps = 640, mu_steps = 1600, r_steps = 1400 }
Using 19 chunk(s) with sizes: 80 80 80 96 80 80 80 96 80 80 80 96 80 80 96 80 80 80 96
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Integration time = 442.362278 s, average per iteration = 691.191059 ms
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Failed to map resource: Operational error (CAL_RESULT_ERROR)
Failed to release CAL resource
Integral 0 time = 446.066926 s
Failed to calculate integral 0
21:31:13 (2680): called boinc_finish

</stderr_txt>
]]>


--------------------------------------------------------------------------------


Matt A. says it's because of the old version of your ATI application.
____________

Link
Avatar
Send message
Joined: 19 Jul 10
Posts: 278
Credit: 10,228,179
RAC: 8,533
Message 54624 - Posted: 2 Jun 2012 | 20:50:54 UTC - in response to Message 54623.
Last modified: 2 Jun 2012 | 20:56:07 UTC

Matt A. says it's because of the old version of your ATI application.

Does that basically mean, that the new tasks are incompatible with HD38x0 cards?
____________
.

Link
Avatar
Send message
Joined: 19 Jul 10
Posts: 278
Credit: 10,228,179
RAC: 8,533
Message 54629 - Posted: 2 Jun 2012 | 22:02:09 UTC - in response to Message 54624.
Last modified: 2 Jun 2012 | 22:09:55 UTC

OK, they all error out, however I get as stderr that:

<core_client_version>6.12.34</core_client_version>
<![CDATA[
<message>
- exit code -1073740940 (0xc0000374)
</message>
<stderr_txt>
Using SSE3 path
Found 1 CAL devices
Chose device 0

Device target: CAL_TARGET_670
Revision: 41
CAL Version: 1.4.1546
Engine clock: 720 Mhz
Memory clock: 900 Mhz
GPU RAM: 512
Wavefront size: 64
Double precision: CAL_TRUE
Compute shader: CAL_FALSE
Number SIMD: 4
Number shader engines: 1
Pitch alignment: 256
Surface alignment: 4096
Max size 2D: { 8192, 8192 }

Estimated iteration time 612.651910 ms
Target frequency 120.000000 Hz, polling mode 4
Dividing into 73 chunks, initially sleeping for 0 ms
Integration range: { nu_steps = 640, mu_steps = 1600, r_steps = 1400 }
Using 73 chunk(s) with sizes: 16 16 32 16 16 32 16 32 16 16 32 16 16 32 16 16 32 16 32 16 16 32 16 16 32 16 32 16 16 32 16 16 32 16 32 16 16 32 16 16 32 16 16 32 16 32 16 16 32 16 16 32 16 32 16 16 32 16 16 32 16 16 32 16 32 16 16 32 16 16 32 16 32
Integration time = 817.866787 s, average per iteration = 1277.916855 ms
Integral 0 time = 819.342291 s
Likelihood time = 2.521662 s
<background_integral> 0.000697824013171 </background_integral>
<stream_integral> 1358.169746558015000 423.166163591165340 </stream_integral>
<background_likelihood> -2.988655550325422 </background_likelihood>
<stream_only_likelihood> -7.295573957091166 -9.251895081428136 </stream_only_likelihood>
<search_likelihood> -2.988610218899344 </search_likelihood>

</stderr_txt>
]]>

so it looks quite normal, only

<search_application> milkywayathome_client separation 0.82 Windows x86_64 double CAL++ </search_application>
08:59:27 (4784): called boinc_finish

is missing after "</search_likelihood>".

EDIT: OK, I suspended the rest of those tasks for now and will crunch Collatz again until you (hopefully) fix that. Is generating few more of those errors with the rest of WUs, that I have here, of any use for you to find the problem, or shall I just abort them?
____________
.

Profile Ray_GTI-R
Avatar
Send message
Joined: 5 Nov 10
Posts: 56
Credit: 10,055,630
RAC: 39,138
Message 54630 - Posted: 2 Jun 2012 | 22:23:18 UTC - in response to Message 54623.
Last modified: 2 Jun 2012 | 22:24:42 UTC

To recap:- on my PCs ...
The current batch(es) of GPU WUs fail.
The previous batches ran sucessfully.
Others report the same issue.

There has been no ATI application/driver change at this end.

I thought the only change was to create new work i.e.,

I'm testing work generation right now, there should be workunits available to download now. Let me know how these workunits are crunching!

Matt A. says it's because of the old version of your ATI application.

Has there been an unannounced update to MW@H crunching requirement(s)?

Profile dskagcommunity
Avatar
Send message
Joined: 26 Feb 11
Posts: 155
Credit: 32,274,068
RAC: 167,890
Message 54631 - Posted: 2 Jun 2012 | 22:34:51 UTC
Last modified: 2 Jun 2012 | 22:36:33 UTC

I get this errors here on 3850:

http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=225449067

But these are from yesterday. Until midnight it seems there are no more wus coming or it stopped computing (there are 33wus in cache not calcultating). Its a unattended machine, can have a look at it monday only.
____________
DSKAG Austria Research Team: http://www.research.dskag.at



Sunny129
Avatar
Send message
Joined: 25 Jan 11
Posts: 249
Credit: 165,090,652
RAC: 523,887
Message 54632 - Posted: 2 Jun 2012 | 23:12:40 UTC
Last modified: 2 Jun 2012 | 23:14:01 UTC

just got another "ps_separation_14_2s_null_3_..." error, though this is only my 2nd error since the server went back up several hours ago. by taking a quick look at my validated results, i'm fairly confident that the above type of task is having a 100% failure rate on my machine. i'm seeing valid "de_separation_14_2s_05_3_..." and "ps_separation_14_2s_null_3_v4_..." tasks, but no valid "ps_separation_14_2s_null_3_..." results.


EDIT - make that 3 "ps_separation_14_2s_null_3_..." errors now.
____________

Profile Ray_GTI-R
Avatar
Send message
Joined: 5 Nov 10
Posts: 56
Credit: 10,055,630
RAC: 39,138
Message 54634 - Posted: 3 Jun 2012 | 1:13:27 UTC - in response to Message 54630.

... PS:- Collatz still happily crunching brand new* GPU tasks using the same "old version of your ATI application".

*downloaded from 9pm today (2nd June).

As I said, nothing new here aside from the latest MW@H GPU tasks consistently failing.

Profile RAMen
Avatar
Send message
Joined: 8 Apr 08
Posts: 41
Credit: 133,657,789
RAC: 198,590
Message 54635 - Posted: 3 Jun 2012 | 3:03:23 UTC
Last modified: 3 Jun 2012 | 3:08:00 UTC

Work Distribution resumed my time (UCT +8) at 3.09am 3rd June.
Time now 11.01am

All: ps_separation_14_2s_null_3_v4 tasks completing as expected

processor: Q9300
OS :WinXP pro
boinc: 7.0.24 (x86)
application: milkyway 1.02
GPU ati5850

No errors to report today
____________

OWN every thing I need
EARN.. enough to live !!!
WANT a solar array on the roof so I can run a BOINC farm( DREAM on!!)
NO wife
NO kids
NO troubles

Profile ^..^~~
Send message
Joined: 22 Oct 11
Posts: 23
Credit: 15,424,029
RAC: 16,699
Message 54636 - Posted: 3 Jun 2012 | 4:41:24 UTC

9:39pm Saturday night California time and work units again stop coming.
^..^~~

Profile Chris Pauquette
Send message
Joined: 26 Jan 10
Posts: 1
Credit: 582,756
RAC: 0
Message 54639 - Posted: 3 Jun 2012 | 9:19:44 UTC

i have done several of these with no problems.

_Chris

Link
Avatar
Send message
Joined: 19 Jul 10
Posts: 278
Credit: 10,228,179
RAC: 8,533
Message 54644 - Posted: 3 Jun 2012 | 13:19:16 UTC - in response to Message 54629.
Last modified: 3 Jun 2012 | 13:22:46 UTC

EDIT: OK, I suspended the rest of those tasks for now and will crunch Collatz again until you (hopefully) fix that. Is generating few more of those errors with the rest of WUs, that I have here, of any use for you to find the problem, or shall I just abort them?

I aborted all those tasks and a HD5800 series card has crunched them successfully. I still hope you can fix this, or at least, that the next batch of tasks is going to be compatible with the CAL application again, otherwise all of us with HD38x0 based cards will have to find a new projects for our GPUs.

BTW, the results for the tasks, which ended with error, have been OK compared with the machines, which got it as resend, so in generall the app seems to crunch those WUs right.
____________
.

Profile RAMen
Avatar
Send message
Joined: 8 Apr 08
Posts: 41
Credit: 133,657,789
RAC: 198,590
Message 54646 - Posted: 3 Jun 2012 | 13:54:53 UTC
Last modified: 3 Jun 2012 | 13:58:50 UTC

Just completed 10 new tasks
ps_separation_14_2s_null_3_v4
all finished correctly

Host ID 14432 ---> ati 5850

Profile Keith Myers
Send message
Joined: 24 Jan 11
Posts: 51
Credit: 32,646,477
RAC: 54,619
Message 54647 - Posted: 3 Jun 2012 | 14:57:58 UTC

All of mine have errored out after completing.

Stderr output

<core_client_version>7.0.28</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
BOINC: parse gpu_opencl_dev_index 1
<search_application> milkyway_separation 1.02 Windows x86_64 double OpenCL </search_application>
Unrecognized XML in project preferences: max_gfx_cpu_pct
Skipping: 100
Skipping: /max_gfx_cpu_pct
Unrecognized XML in project preferences: apps_selected
Skipping: app_id
Skipping: /apps_selected
Unrecognized XML in project preferences: nbody_graphics_poll_period
Skipping: 30
Skipping: /nbody_graphics_poll_period
Unrecognized XML in project preferences: nbody_graphics_float_speed
Skipping: 5
Skipping: /nbody_graphics_float_speed
Unrecognized XML in project preferences: nbody_graphics_textured_point_size
Skipping: 250
Skipping: /nbody_graphics_textured_point_size
Unrecognized XML in project preferences: nbody_graphics_point_point_size
Skipping: 40
Skipping: /nbody_graphics_point_point_size
BOINC GPU type suggests using OpenCL vendor 'NVIDIA Corporation'
Using SSE3 path
Found 1 platform
Platform 0 information:
Name: NVIDIA CUDA
Version: OpenCL 1.1 CUDA 4.2.1
Vendor: NVIDIA Corporation
Extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
Profile: FULL_PROFILE
Using device 1 on platform 0
Found 2 CL devices
Device 'GeForce GTX 560 Ti' (NVIDIA Corporation:0x10de) (CL_DEVICE_TYPE_GPU)
Driver version: 301.42
Version: OpenCL 1.1 CUDA
Compute capability: 2.1
Max compute units: 8
Clock frequency: 1644 Mhz
Global mem size: 1073545216
Local mem size: 49152
Max const buf size: 65536
Double extension: cl_khr_fp64
Build log:
--------------------------------------------------------------------------------

ptxas info : Compiling entry function 'probabilities' for 'sm_21'
ptxas info : Function properties for probabilities
80 bytes stack frame, 76 bytes spill stores, 76 bytes spill loads
ptxas info : Used 63 registers, 100 bytes cmem[0], 56 bytes cmem[16]
--------------------------------------------------------------------------------
Build log:
--------------------------------------------------------------------------------


--------------------------------------------------------------------------------
Estimated Nvidia GPU GFLOP/s: 842 SP GFLOP/s, 105 DP FLOP/s
Using a target frequency of 60.0
Using a block size of 4096 with 17 blocks/chunk
Using clWaitForEvents() for polling with initial wait of 12 ms (mode 0)
Range: { nu_steps = 640, mu_steps = 1600, r_steps = 1400 }
Iteration area: 2240000
Chunk estimate: 32
Num chunks: 33
Chunk size: 69632
Added area: 57856
Effective area: 2297856
Initial wait: 12 ms
Integration time: 673.373698 s. Average time per iteration = 1052.146403 ms
Integral 0 time = 675.587139 s
Failed to read number of star points from file
(2): No such file or directory
Failed to calculate likelihood
<background_integral> 0.001803516386326 </background_integral>
<stream_integral> 358.404514961074600 774.535266066600000 </stream_integral>
<background_likelihood> 1.#QNAN0000000000 </background_likelihood>
<stream_only_likelihood> 1.#QNAN0000000000 1.#QNAN0000000000 </stream_only_likelihood>
<search_likelihood> 1.#QNAN0000000000 </search_likelihood>
17:49:37 (3316): called boinc_finish

</stderr_txt>
]]>

____________

Profile dskagcommunity
Avatar
Send message
Joined: 26 Feb 11
Posts: 155
Credit: 32,274,068
RAC: 167,890
Message 54648 - Posted: 3 Jun 2012 | 16:22:39 UTC - in response to Message 54644.

EDIT: OK, I suspended the rest of those tasks for now and will crunch Collatz again until you (hopefully) fix that. Is generating few more of those errors with the rest of WUs, that I have here, of any use for you to find the problem, or shall I just abort them?

I aborted all those tasks and a HD5800 series card has crunched them successfully. I still hope you can fix this, or at least, that the next batch of tasks is going to be compatible with the CAL application again, otherwise all of us with HD38x0 based cards will have to find a new projects for our GPUs.

BTW, the results for the tasks, which ended with error, have been OK compared with the machines, which got it as resend, so in generall the app seems to crunch those WUs right.


Unfortunally this was the last science project for this cards :/ so i hope too this is recoverable in any way for 38x0 cards or i cant continue computing for mw cos it was the only card i had over for this. :/
____________
DSKAG Austria Research Team: http://www.research.dskag.at



Link
Avatar
Send message
Joined: 19 Jul 10
Posts: 278
Credit: 10,228,179
RAC: 8,533
Message 54649 - Posted: 3 Jun 2012 | 17:28:44 UTC - in response to Message 54648.
Last modified: 3 Jun 2012 | 17:35:41 UTC

Unfortunally this was the last science project for this cards :/ so i hope too this is recoverable in any way for 38x0 cards or i cant continue computing for mw cos it was the only card i had over for this. :/

Well, there's still Collatz (which is my backup project in case MilkyWay has some issues) and Moo! Wrapper. Moo I havn't tested yet, but they have at least an ati14 app in their list. OTOH, I'm always little unsure about using the word "science", when talking about these projects.
____________
.

Profile dskagcommunity
Avatar
Send message
Joined: 26 Feb 11
Posts: 155
Credit: 32,274,068
RAC: 167,890
Message 54651 - Posted: 3 Jun 2012 | 18:01:35 UTC
Last modified: 3 Jun 2012 | 18:03:10 UTC

Yes, I ment science projects not energy wasting projects ^^

PS: moo can run on a 38x0, yes.
____________
DSKAG Austria Research Team: http://www.research.dskag.at



bob
Send message
Joined: 12 Apr 09
Posts: 12
Credit: 40,832,932
RAC: 31,309
Message 54652 - Posted: 3 Jun 2012 | 18:19:13 UTC

Per the very thoughful suggestion on another thread. Does any one have the faintest idea of the issue.

Here is the dump on just one of the 21 that ended the same way.

Does anyone know what this really means? Because I have no idea, but 21 task all pretty much aborted in the same manner.

Boinc 7.0.25
ATI 6590 card with 8.961.0.0 Driver
Windows XP Service Pack 3, 32 Bit.

Stderr output

<core_client_version>7.0.25</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
BOINC: parse gpu_opencl_dev_index 0
<search_application> milkyway_separation 1.02 Windows x86 double OpenCL </search_application>
Unrecognized XML in project preferences: max_gfx_cpu_pct
Skipping: 20
Skipping: /max_gfx_cpu_pct
Unrecognized XML in project preferences: allow_non_preferred_apps
Skipping: 1
Skipping: /allow_non_preferred_apps
Unrecognized XML in project preferences: nbody_graphics_poll_period
Skipping: 30
Skipping: /nbody_graphics_poll_period
Unrecognized XML in project preferences: nbody_graphics_float_speed
Skipping: 5
Skipping: /nbody_graphics_float_speed
Unrecognized XML in project preferences: nbody_graphics_textured_point_size
Skipping: 250
Skipping: /nbody_graphics_textured_point_size
Unrecognized XML in project preferences: nbody_graphics_point_point_size
Skipping: 40
Skipping: /nbody_graphics_point_point_size
BOINC GPU type suggests using OpenCL vendor 'Advanced Micro Devices, Inc.'
Using SSE3 path
Found 1 platform
Platform 0 information:
Name: ATI Stream
Version: OpenCL 1.1 ATI-Stream-v2.3 (451)
Vendor: Advanced Micro Devices, Inc.
Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
Profile: FULL_PROFILE
Using device 0 on platform 0
Found 1 CL device
Device 'Cayman' (Advanced Micro Devices, Inc.:0x1002) (CL_DEVICE_TYPE_GPU)
Driver version: CAL 1.4.1546
Version: OpenCL 1.1 ATI-Stream-v2.3 (451)
Compute capability: 0.0
Max compute units: 22
Clock frequency: 825 Mhz
Global mem size: 1073741824
Local mem size: 32768
Max const buf size: 65536
Double extension: cl_amd_fp64
Build log:
--------------------------------------------------------------------------------
C:\DOCUME~1\Beverly1\LOCALS~1\Temp\OCL3CD.tmp.cl(201): error: invalid unroll
factor
#pragma unroll NSTREAM
^

C:\DOCUME~1\Beverly1\LOCALS~1\Temp\OCL3CD.tmp.cl(243): error: invalid unroll
factor
#pragma unroll NSTREAM
^

C:\DOCUME~1\Beverly1\LOCALS~1\Temp\OCL3CD.tmp.cl(272): error: invalid unroll
factor
#pragma unroll NSTREAM
^

C:\DOCUME~1\Beverly1\LOCALS~1\Temp\OCL3CD.tmp.cl(279): error: invalid unroll
factor
#pragma unroll NSTREAM
^

C:\DOCUME~1\Beverly1\LOCALS~1\Temp\OCL3CD.tmp.cl(287): error: invalid unroll
factor
#pragma unroll NSTREAM
^

5 errors detected in the compilation of "C:\DOCUME~1\Beverly1\LOCALS~1\Temp\OCL3CD.tmp.cl".
&#208;@&#128;&#155;&#253;
--------------------------------------------------------------------------------
clBuildProgram: Build failure (-11): CL_BUILD_PROGRAM_FAILURE
Error building program from source (-11): CL_BUILD_PROGRAM_FAILURE
Error creating integral program from source
Failed to calculate likelihood
<background_integral> 1.#QNAN0000000000 </background_integral>
<stream_integral> 1.#QNAN0000000000 1.#QNAN0000000000 </stream_integral>
<background_likelihood> 1.#QNAN0000000000 </background_likelihood>
<stream_only_likelihood> 1.#QNAN0000000000 1.#QNAN0000000000 </stream_only_likelihood>
<search_likelihood> 1.#QNAN0000000000 </search_likelihood>
06:25:53 (3284): called boinc_finish

</stderr_txt>
]]>

Profile ^..^~~
Send message
Joined: 22 Oct 11
Posts: 23
Credit: 15,424,029
RAC: 16,699
Message 54653 - Posted: 3 Jun 2012 | 19:38:06 UTC

Anybody out there getting work units? My computers stopped receiving them around nine last night.
^..^~~

Profile dskagcommunity
Avatar
Send message
Joined: 26 Feb 11
Posts: 155
Credit: 32,274,068
RAC: 167,890
Message 54654 - Posted: 3 Jun 2012 | 19:51:21 UTC

In serverstatus you can see there is no work left. The workgenerator service is stopped.
____________
DSKAG Austria Research Team: http://www.research.dskag.at



Profile dskagcommunity
Avatar
Send message
Joined: 26 Feb 11
Posts: 155
Credit: 32,274,068
RAC: 167,890
Message 54677 - Posted: 5 Jun 2012 | 23:14:30 UTC

OK works again (on the 4850) thx for bringing the work generator back.

So the other question in direction Travis now is, is 38x0 really dead now or can we still expect something? Only to know if i can deactivade it. It still computes until 100% in the normal 7 Minutes like always but it dont stops then computing on 100%. So the WU never stops/finish and never get uploaded.


____________
DSKAG Austria Research Team: http://www.research.dskag.at



Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 54678 - Posted: 6 Jun 2012 | 0:55:39 UTC - in response to Message 54677.

OK works again (on the 4850) thx for bringing the work generator back.

So the other question in direction Travis now is, is 38x0 really dead now or can we still expect something? Only to know if i can deactivade it. It still computes until 100% in the normal 7 Minutes like always but it dont stops then computing on 100%. So the WU never stops/finish and never get uploaded.



What do you mean by 38x0?

--Travis
____________

Sunny129
Avatar
Send message
Joined: 25 Jan 11
Posts: 249
Credit: 165,090,652
RAC: 523,887
Message 54679 - Posted: 6 Jun 2012 | 1:26:34 UTC

i believe he is referring to the HD 3800 series GPU lineup.
____________

Profile arkayn
Avatar
Send message
Joined: 14 Feb 09
Posts: 914
Credit: 74,781,320
RAC: 237
Message 54684 - Posted: 6 Jun 2012 | 5:45:32 UTC - in response to Message 54678.

OK works again (on the 4850) thx for bringing the work generator back.

So the other question in direction Travis now is, is 38x0 really dead now or can we still expect something? Only to know if i can deactivade it. It still computes until 100% in the normal 7 Minutes like always but it dont stops then computing on 100%. So the WU never stops/finish and never get uploaded.



What do you mean by 38x0?

--Travis


OpenCL does not work on the HD3850 so they have been using the older 0.82 app to crunch with, which was working until the recent changes.
____________

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 54685 - Posted: 6 Jun 2012 | 6:01:06 UTC - in response to Message 54684.

OK works again (on the 4850) thx for bringing the work generator back.

So the other question in direction Travis now is, is 38x0 really dead now or can we still expect something? Only to know if i can deactivade it. It still computes until 100% in the normal 7 Minutes like always but it dont stops then computing on 100%. So the WU never stops/finish and never get uploaded.



What do you mean by 38x0?

--Travis


OpenCL does not work on the HD3850 so they have been using the older 0.82 app to crunch with, which was working until the recent changes.


I'm hoping the new search I put up fixes the error with those GPUs. If I get some confirmation on that I'll stop the other ones running and start up some more searches like the new one.
____________

Profile dskagcommunity
Avatar
Send message
Joined: 26 Feb 11
Posts: 155
Credit: 32,274,068
RAC: 167,890
Message 54686 - Posted: 6 Jun 2012 | 10:15:35 UTC - in response to Message 54684.
Last modified: 6 Jun 2012 | 10:22:29 UTC

OK works again (on the 4850) thx for bringing the work generator back.

So the other question in direction Travis now is, is 38x0 really dead now or can we still expect something? Only to know if i can deactivade it. It still computes until 100% in the normal 7 Minutes like always but it dont stops then computing on 100%. So the WU never stops/finish and never get uploaded.



What do you mean by 38x0?

--Travis


OpenCL does not work on the HD3850 so they have been using the older 0.82 app to crunch with, which was working until the recent changes.


Good morning!

Correct , it was running with the 0.82 app :)
____________
DSKAG Austria Research Team: http://www.research.dskag.at



Profile JHAPA
Avatar
Send message
Joined: 5 Dec 08
Posts: 4
Credit: 1,810,447
RAC: 0
Message 54691 - Posted: 6 Jun 2012 | 18:36:24 UTC - in response to Message 54686.

Hi,
it is right that for this time my 3850 has no new work , because new WUs are failing? and there will be no suport for?
Thanks JHAPA
____________

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 54695 - Posted: 6 Jun 2012 | 22:41:28 UTC - in response to Message 54686.

OK works again (on the 4850) thx for bringing the work generator back.

So the other question in direction Travis now is, is 38x0 really dead now or can we still expect something? Only to know if i can deactivade it. It still computes until 100% in the normal 7 Minutes like always but it dont stops then computing on 100%. So the WU never stops/finish and never get uploaded.



What do you mean by 38x0?

--Travis


OpenCL does not work on the HD3850 so they have been using the older 0.82 app to crunch with, which was working until the recent changes.


Good morning!

Correct , it was running with the 0.82 app :)


Okay cool, I'm gonna take down the searches that aren't working on the 38x0 and put up some new ones which should. Hopefully this clears up the issue.
____________

Sunny129
Avatar
Send message
Joined: 25 Jan 11
Posts: 249
Credit: 165,090,652
RAC: 523,887
Message 54778 - Posted: 13 Jun 2012 | 22:21:43 UTC

well it appears this thread has been dormant for a full week now...shame i gotta bring it back. i was happy to say that my HD 6950 had racked up ~9,000 consecutive valid tasks over the last 10 or so days, but today i got another "ps_separation_14_2s_null_3" error, specifically a "ps_separation_14_2s_null_3_1338573431_354_2" task. i don't know if its anything meaningful or not, but i felt it was my duty as a project participant to at least report it, since its one of the sub-types of tasks Matt is testing in this thread. i'm hoping it was just a rare glitch, but i have my doubts b/c the stderr output file looks like it crunched normally.

i'll mention that i had a completed task get marked invalid today as well. the stderr output again looked like it ran normally. so i looked at the wingmen and realized that all 4 of them got errors, rendering my result useless. though i know there was no fault on my end, i thought i'd post about it b/c its a different type of task than the one discussed above...specifically, this task is a "de_separation_14_2s_05_3_test_1_rand_1339497601_758925" task.

anyhow, just thought i'd let the developers and testers know...i'm not reading into it too much at this point, but i'll post more if the errors start rolling in...
____________

nanoprobe
Avatar
Send message
Joined: 27 Jan 12
Posts: 34
Credit: 4,224,533
RAC: 6,942
Message 54785 - Posted: 15 Jun 2012 | 20:09:59 UTC

I just tried a couple of WU on a 5870 I acquired. Both ran to 100% and then went computation error. New/old problem again?

Link
Avatar
Send message
Joined: 19 Jul 10
Posts: 278
Credit: 10,228,179
RAC: 8,533
Message 54787 - Posted: 15 Jun 2012 | 22:40:01 UTC - in response to Message 54785.

I just tried a couple of WU on a 5870 I acquired. Both ran to 100% and then went computation error. New/old problem again?

Are you still using the v0.82 CAL app? The current WUs seem to be incompatible with it on 64-bit systems, but your GPU is OpenCL capable, so you could try that.
____________
.

nanoprobe
Avatar
Send message
Joined: 27 Jan 12
Posts: 34
Credit: 4,224,533
RAC: 6,942
Message 54788 - Posted: 15 Jun 2012 | 22:57:32 UTC - in response to Message 54787.

I just tried a couple of WU on a 5870 I acquired. Both ran to 100% and then went computation error. New/old problem again?

Are you still using the v0.82 CAL app? The current WUs seem to be incompatible with it on 64-bit systems, but your GPU is OpenCL capable, so you could try that.

I've stuck with the 0.82 because the later ones caused the GPU usage and power draw to be all over the place which I was not comfortable with. Even with an app_info to try and control GPU% didn't help. I may try the later app but I hope they can solve this 64 bit issue.
____________

nanoprobe
Avatar
Send message
Joined: 27 Jan 12
Posts: 34
Credit: 4,224,533
RAC: 6,942
Message 54791 - Posted: 16 Jun 2012 | 14:08:59 UTC
Last modified: 16 Jun 2012 | 14:57:54 UTC

I've been trying to get some new tasks for my XP 32 bit machine without success. Here's the message log.

6/16/2012 10:03:38 AM | Milkyway@Home | work fetch resumed by user
6/16/2012 10:03:40 AM | Milkyway@Home | update requested by user
6/16/2012 10:03:45 AM | Milkyway@Home | Sending scheduler request: Requested by user.
6/16/2012 10:03:45 AM | Milkyway@Home | Not reporting or requesting tasks
6/16/2012 10:03:46 AM | Milkyway@Home | Scheduler request completed

Tried a reset, didn't help.

NM: Got it working.
FWIW I still find that the 1.02 app causes my kill-a-watt meter to fluctuate by as much as 60 watts on 64 bit and 25 watts on 32 bit while I never noticed the 0.82 app fluctuate by more than 5 watts.

Profile Ray_GTI-R
Avatar
Send message
Joined: 5 Nov 10
Posts: 56
Credit: 10,055,630
RAC: 39,138
Message 54794 - Posted: 16 Jun 2012 | 18:09:09 UTC
Last modified: 16 Jun 2012 | 18:10:38 UTC

Sorry to harp on again about the HD3850 AGP issue ...

Replaced Windows 7 SP1 (64-bit) with XP SP3 (32-bit) - so, same hardware. All .NET/drivers/updates/BOINC reinstalled (see previous post for details).
Since last night I have downloaded fresh new WUs and processed over 70 GPU tasks OK (one fail, probably a fluke during update of .NET).
So ...
W7 64-bit/HD3850 AGP & existing MW@H GPU tasks ALL fail with "Computation error"
whereas ...
XP 32-bit/HD3850 AGP & existing MW@H GPU tasks work OK.

What do you reckon?

Link
Avatar
Send message
Joined: 19 Jul 10
Posts: 278
Credit: 10,228,179
RAC: 8,533
Message 54795 - Posted: 16 Jun 2012 | 18:56:42 UTC - in response to Message 54794.
Last modified: 16 Jun 2012 | 18:58:03 UTC

Sorry to harp on again about the HD3850 AGP issue ...

It's not really a HD3850 AGP issue, it's a Win7-x64 CAL application issue, maybe not even limited to Win7, can be all 64-bit Windows versions (even if I don't see anyone here with 64-bit XP or Vista complaining about these errors). My HD3850 is PCIe and nanoprobe (see few posts above) has the same issue with his HD5870. So it's not hardware specific, it's just software.
____________
.

Profile dskagcommunity
Avatar
Send message
Joined: 26 Feb 11
Posts: 155
Credit: 32,274,068
RAC: 167,890
Message 54796 - Posted: 16 Jun 2012 | 19:33:45 UTC

I use a 3850 AGP on WinXP32. (only to let ya know it is not 64bit only)
____________
DSKAG Austria Research Team: http://www.research.dskag.at



Profile Ray_GTI-R
Avatar
Send message
Joined: 5 Nov 10
Posts: 56
Credit: 10,055,630
RAC: 39,138
Message 54797 - Posted: 16 Jun 2012 | 22:14:45 UTC - in response to Message 54795.

... it's just software.

Exactly the point I have been trying to make for a couple of weeks. And exactly why I did the latest exercise just to prove it's not hardware.
Specifically, it's new MW@H tasks.
Older MW@H tasks worked fine on the same hardware & W7 64-bit:- see my earlier posts which explain it all.

I use a 3850 AGP on WinXP32. (only to let ya know it is not 64bit only).

dskagcommunity ... looking back on your posts they are different from the errors I got. Compare my working BOINC version/driver deails posted earlier with yours. Also worth checking that all .NET updates are, erm, up to date and flush old tasks then try the latest ... at various points in the past couple of weeks there have been a few false starts with generating new WU's that work.

nanoprobe
Avatar
Send message
Joined: 27 Jan 12
Posts: 34
Credit: 4,224,533
RAC: 6,942
Message 54798 - Posted: 16 Jun 2012 | 22:17:30 UTC

Here is what i was referring to with the wattage fluctuations on 1.02 app. Is this normal and if so why?

Profile Ray_GTI-R
Avatar
Send message
Joined: 5 Nov 10
Posts: 56
Credit: 10,055,630
RAC: 39,138
Message 54801 - Posted: 17 Jun 2012 | 1:22:01 UTC
Last modified: 17 Jun 2012 | 1:55:34 UTC

Even with an app_info to try and control GPU% didn't help.

I use ATI Tray Tools rather than BOINC settings.
With ATI Tray Tools I can:-
a) set a custom fan profile (the fan profile in the standard driver had a major flaw)
b) set a custom VDDC (I'm currently running one HD3850 at 0.975V at full speed and a "problem", hot-running HD3850 at 1.014V) rather than the standard 1.254V
c) set the GPU & memory speeds (to +/- 2mhz. The "problem" HD3850 has a GPU speed set to 567mhz rather than the 669mhz standard & memory at full speed)
With ATI Tray tools set as above these cards do not exceed 55C at +/- 100% load, fan @ 60% max with lots of fresh, filtered airflow. OK, it's a cool summer here in the UK - overnight ambient 16.6C & breezy :-)

Why? Hot=inefficient=more current IIRC.
Sorry, no idea why 0.82 is better than 1.02 by 78W/0.65A @ 119V AC

Profile Ray_GTI-R
Avatar
Send message
Joined: 5 Nov 10
Posts: 56
Credit: 10,055,630
RAC: 39,138
Message 54808 - Posted: 18 Jun 2012 | 1:27:37 UTC

So anyway ...

W7 64-bit/HD3850 AGP & existing MW@H GPU tasks ALL fail with "Computation error"
whereas ...
XP 32-bit/HD3850 AGP & existing MW@H GPU tasks work OK.
All SETI/Catalyst/etc the same on the same hardware.

What are your thoughts?

Dr Who Fan
Avatar
Send message
Joined: 8 Aug 08
Posts: 3
Credit: 37,877
RAC: 0
Message 54830 - Posted: 20 Jun 2012 | 0:18:32 UTC

CPU version still failing on Linux/Ubuntu 12.04 using BOINC 7.0.28:
ps_separation_14_2s_05_3_test_1_1339497601_4215469_1

Stderr output
<core_client_version>7.0.28</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>
<stderr_txt>
<search_application> milkyway_separation 1.00 Linux x86 double </search_application>
Unrecognized XML in project preferences: max_gfx_cpu_pct
Skipping: 50
Skipping: /max_gfx_cpu_pct
Unrecognized XML in project preferences: allow_non_preferred_apps
Skipping: 1
Skipping: /allow_non_preferred_apps
Error loading Lua script 'astronomy_parameters.txt': [string "argv = {...}..."]:1: arguments not set
Error reading astronomy parameters from file 'astronomy_parameters.txt'

Trying old parameters file
Error reading number_parameters

05:55:37 (7350): called boinc_finish

</stderr_txt>
]]>

____________

Profile Dale Jake Corner
Send message
Joined: 25 Aug 11
Posts: 3
Credit: 1,758,322
RAC: 3,190
Message 54840 - Posted: 21 Jun 2012 | 5:54:51 UTC

I am currently getting no computational errors involving the work units. In fact I have not had a single error on a computational unit since the "test" series began.

I believe you have found the fix for the problem.

Dale

Profile dskagcommunity
Avatar
Send message
Joined: 26 Feb 11
Posts: 155
Credit: 32,274,068
RAC: 167,890
Message 54843 - Posted: 22 Jun 2012 | 9:46:56 UTC
Last modified: 22 Jun 2012 | 9:49:42 UTC

Just as update, Tried to upgrade to 7.0.25 but nothing changed to my "100% but no computing end" problem on MW *sign* seems i must continue to build/search a new opencl compatible mw system for wide under 100euro.
____________
DSKAG Austria Research Team: http://www.research.dskag.at



Message boards : News : testing work generation with 'ps_separation_14_2s_null_3'


Main page · Your account · Message boards


Copyright © 2013 AstroInformatics Group