Welcome to MilkyWay@home

Bad WUs

Message boards : Number crunching : Bad WUs
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Beyond
Avatar

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 729,293,740
RAC: 0
Message 48738 - Posted: 13 May 2011, 16:01:28 UTC

In the last day I've been getting a number of "error while computing" WUs on all my machines. All other machines are failing these same WUs. Here's an example:

http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=20534102
ID: 48738 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matthew
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 6 May 09
Posts: 217
Credit: 6,856,375
RAC: 0
Message 48739 - Posted: 13 May 2011, 16:27:46 UTC - in response to Message 48738.  

Unfortunately the server clears out finished WU info quick. Can you post the name for these WU's? It should start with "de_separation" or "de_nbody." Thanks.

-Matthew
ID: 48739 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile valterc

Send message
Joined: 28 Aug 09
Posts: 23
Credit: 1,266,152,880
RAC: 128,910
Message 48740 - Posted: 13 May 2011, 16:43:28 UTC - in response to Message 48739.  

i just got two of them:

de_separation_10_3s_free_2_420330_1305301089_1
de_separation_10_3s_free_2_423122_1305301520_0

the stderr log is the following:

<core_client_version>6.10.60</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
<search_application> milkywayathome_client separation 0.62 Windows x86 double CAL++ </search_application>
Found 1 CAL devices
Chose device 0

Device target:         CAL_TARGET_CAYMAN
Revision:              1
CAL Version:           1.4.900
Engine clock:          810 Mhz
Memory clock:          1100 Mhz
GPU RAM:               2048
Wavefront size:        64
Double precision:      CAL_TRUE
Compute shader:        CAL_TRUE
Number SIMD:           22
Number shader engines: 2
Pitch alignment:       256
Surface alignment:     4096
Max size 2D:           { 16384, 16384 }

Estimated iteration time 119.647166 ms
Target frequency 30.000000 Hz, polling mode 1, using responsiveness factor of 1.000000
Dividing into 4 chunks
Integration range: { nu_steps = 640, mu_steps = 1600, r_steps = 1400 }
Using { 1, 4 } chunk(s) of size { 1400, 400 }
Integration time = 82.015017 s, average per iteration = 128.148464 ms
Integral 0 time = 83.759557 s
Estimated iteration time 29.911792 ms
Target frequency 30.000000 Hz, polling mode 1, using responsiveness factor of 1.000000
Dividing into 1 chunks
Integration range: { nu_steps = 640, mu_steps = 400, r_steps = 1400 }
Using { 1, 1 } chunk(s) of size { 1400, 400 }
Integration time = 20.501572 s, average per iteration = 32.033706 ms
Integral 1 time = 20.980437 s
Likelihood time = 6.569375 s
Non-finite result
Failed to calculate likelihood
<background_integral> 0.000972832098789 </background_integral>
<stream_integral>  27.656173175642195  1183.863218271929800  -0.138228459499025 </stream_integral>
<background_likelihood> -2.983789565586555 </background_likelihood>
<stream_only_likelihood>  -58.146684859558270  -7.945487711078723  -1.#IND00000000000 </stream_only_likelihood>
<search_likelihood> -1.#IND00000000000 </search_likelihood>
18:19:23 (2796): called boinc_finish

</stderr_txt>

ID: 48740 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile The Gas Giant
Avatar

Send message
Joined: 24 Dec 07
Posts: 1947
Credit: 240,884,648
RAC: 0
Message 48741 - Posted: 13 May 2011, 16:56:32 UTC
Last modified: 13 May 2011, 16:58:33 UTC

de_separation_10_3s_free_2_423096_1305301520
de_separation_10_3s_free_2_414045_1305300192

de_separation_13_3s_free_2_440995_1305303756 <-empty stderr output

de_separation_13_3s_fix20_1_4605109_1304741455 (too many total results)
ID: 48741 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 729,293,740
RAC: 0
Message 48742 - Posted: 13 May 2011, 17:29:47 UTC - in response to Message 48739.  

Unfortunately the server clears out finished WU info quick. Can you post the name for these WU's? It should start with "de_separation" or "de_nbody." Thanks.

-Matthew

There's getting to be a LOT of them:

Here's a few:

de_separation_10_3s_free_2_267219_1305277529
de_separation_10_3s_free_2_345488_1305289665
de_separation_10_3s_free_2_372388_1305293790
de_separation_10_3s_free_2_339020_1305288748
de_separation_10_3s_free_2_399297_1305297962
de_separation_10_3s_free_2_396418_1305297505
de_separation_10_3s_free_2_414226_1305300192
de_separation_10_3s_free_2_423450_1305301520
de_separation_10_3s_free_2_420075_1305301086
de_separation_10_3s_free_2_438158_1305303752
de_separation_10_3s_free_2_447027_1305305089

Some of the bad WUs I've gotten on several machines, all failing. They fail on every machine they're sent to.
ID: 48742 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 729,293,740
RAC: 0
Message 48743 - Posted: 13 May 2011, 17:43:01 UTC
Last modified: 13 May 2011, 18:14:19 UTC

More bad ones:

de_separation_10_3s_free_2_432480_1305302870
de_separation_17_3s_fix_5_46_1305223043
de_separation_17_3s_fix_5_142_1305223043
de_separation_10_3s_free_2_414220_1305300192
de_separation_10_3s_free_2_453301_1305305955
de_separation_10_3s_free_2_453153_1305305955
de_separation_10_3s_free_2_459484_1305306841
de_separation_10_3s_free_2_438360_1305303752
de_separation_10_3s_free_2_447410_1305305089
de_separation_10_3s_free_2_441307_1305304196
de_separation_10_3s_free_2_396418_1305297505
de_separation_10_3s_free_2_438470_1305303752
de_separation_10_3s_free_2_438158_1305303752
de_separation_10_3s_free_2_468444_1305308188
de_separation_10_3s_free_2_465352_1305307741
ID: 48743 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Starfire

Send message
Joined: 19 Feb 09
Posts: 32
Credit: 32,843,308
RAC: 0
Message 48746 - Posted: 13 May 2011, 18:43:20 UTC

A few from me (same error as mentioned above):

de_separation_10_3s_free_2_453478_1305305955_1
de_separation_10_3s_free_2_462196_1305307281_0
de_separation_10_3s_free_2_420164_1305301086_1


Error example:
<core_client_version>6.12.26</core_client_version>
<![CDATA[
<message>
Unzul�ssige Funktion. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
<search_application> milkywayathome_client separation 0.62 Windows x86 double CAL++ </search_application>
Found 1 CAL devices
Chose device 0

Device target:         CAL_TARGET_770
Revision:              2
CAL Version:           1.4.1385
Engine clock:          750 Mhz
Memory clock:          900 Mhz
GPU RAM:               1024
Wavefront size:        64
Double precision:      CAL_TRUE
Compute shader:        CAL_TRUE
Number SIMD:           10
Number shader engines: 1
Pitch alignment:       256
Surface alignment:     4096
Max size 2D:           { 8192, 8192 }

Estimated iteration time 284.281667 ms
Target frequency 40.000000 Hz, polling mode 1, using responsiveness factor of 1.000000
Dividing into 16 chunks
Integration range: { nu_steps = 640, mu_steps = 1600, r_steps = 1400 }
Using { 1, 16 } chunk(s) of size { 1400, 100 }
Integration time = 206.497489 s, average per iteration = 322.652326 ms
Integral 0 time = 208.028925 s
Estimated iteration time 71.070417 ms
Target frequency 40.000000 Hz, polling mode 1, using responsiveness factor of 1.000000
Dividing into 2 chunks
Integration range: { nu_steps = 640, mu_steps = 400, r_steps = 1400 }
Using { 1, 2 } chunk(s) of size { 1400, 200 }
Integration time = 51.262623 s, average per iteration = 80.097848 ms
Integral 1 time = 51.766143 s
Likelihood time = 8.690186 s
Non-finite result
Failed to calculate likelihood
<background_integral> 0.001134262006206 </background_integral>
<stream_integral>  24.680924661083985  1331.077010865039700  -0.317058930291751 </stream_integral>
<background_likelihood> -3.043218251293072 </background_likelihood>
<stream_only_likelihood>  -10.208126603949683  -8.771491003836179  -1.#IND00000000000 </stream_only_likelihood>
<search_likelihood> -1.#IND00000000000 </search_likelihood>
19:26:59 (3724): called boinc_finish

</stderr_txt>
]]>

Starfire
ID: 48746 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 729,293,740
RAC: 0
Message 48747 - Posted: 13 May 2011, 18:56:54 UTC
Last modified: 13 May 2011, 19:53:13 UTC

Even more:

de_separation_10_3s_free_2_456101_1305306387
de_separation_10_3s_free_2_486165_1305310853
de_separation_10_3s_free_2_480196_1305309952
de_separation_10_3s_free_2_471204_1305308645
de_separation_10_3s_free_2_474195_1305309078
de_separation_10_3s_free_2_453159_1305305955
de_separation_10_3s_free_2_486384_1305310854
de_separation_10_3s_free_2_474195_1305309078
de_separation_10_3s_free_2_495316_1305312172
de_separation_10_3s_free_2_429487_1305302425
de_separation_10_3s_free_2_495368_1305312172
de_separation_10_3s_free_2_501487_1305313065
de_separation_10_3s_free_2_501086_1305313065
de_separation_10_3s_free_2_498477_1305312620
de_separation_10_3s_free_2_498218_1305312619
de_separation_10_3s_free_2_486495_1305310854
de_separation_10_3s_free_2_462254_1305307281
de_separation_10_3s_free_2_483065_1305310394
de_separation_10_3s_free_2_489441_1305311299
de_separation_10_3s_free_2_477137_1305309511
de_separation_10_3s_free_2_477479_1305309511
de_separation_10_3s_free_2_486494_1305310854

It's getting worse...
ID: 48747 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Chris Skull
Avatar

Send message
Joined: 16 Dec 10
Posts: 46
Credit: 205,697,511
RAC: 0
Message 48748 - Posted: 13 May 2011, 19:14:13 UTC

I've also some error WU's...

ID: 48748 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matthew
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 6 May 09
Posts: 217
Credit: 6,856,375
RAC: 0
Message 48749 - Posted: 13 May 2011, 19:16:05 UTC

I'm going to shut down the "de_separation_10_3s_free_2..." runs. It may take a bit the remaining WUs to filter out of the system.

-Matthew
ID: 48749 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Chris Skull
Avatar

Send message
Joined: 16 Dec 10
Posts: 46
Credit: 205,697,511
RAC: 0
Message 48750 - Posted: 13 May 2011, 19:31:21 UTC

Fastest service in the BOINC-cosmos THX !!!!

ID: 48750 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 729,293,740
RAC: 0
Message 48753 - Posted: 13 May 2011, 20:45:07 UTC - in response to Message 48749.  

I'm going to shut down the "de_separation_10_3s_free_2..." runs. It may take a bit the remaining WUs to filter out of the system.

-Matthew

Yep the bad WUs are still coming. So far have had 93+ of these error out today. Unfortunately they error at the end and a few have even "gotten stuck" and run for hours instead of the usual few minutes.
ID: 48753 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
swiftmallard
Avatar

Send message
Joined: 18 Jul 09
Posts: 300
Credit: 303,562,776
RAC: 0
Message 48758 - Posted: 14 May 2011, 1:54:00 UTC
Last modified: 14 May 2011, 1:55:37 UTC

Me too...

de_separation_10_3s_free_2_465180_1305307741

Stderr output

<core_client_version>6.10.60</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
<search_application> milkywayathome_client separation 0.62 Windows x86 double CAL++ </search_application>
Found 1 CAL devices
Chose device 0

Device target: CAL_TARGET_CYPRESS
Revision: 2
CAL Version: 1.4.1332
Engine clock: 900 Mhz
Memory clock: 900 Mhz
GPU RAM: 1024
Wavefront size: 64
Double precision: CAL_TRUE
Compute shader: CAL_TRUE
Number SIMD: 20
Number shader engines: 2
Pitch alignment: 256
Surface alignment: 4096
Max size 2D: { 16384, 16384 }

Estimated iteration time 118.450694 ms
Target frequency 30.000000 Hz, polling mode 1, using responsiveness factor of 1.000000
Dividing into 4 chunks
Integration range: { nu_steps = 640, mu_steps = 1600, r_steps = 1400 }
Using { 1, 4 } chunk(s) of size { 1400, 400 }
<search_application> milkywayathome_client separation 0.62 Windows x86 double CAL++ </search_application>
Found 1 CAL devices
Chose device 0

Device target: CAL_TARGET_CYPRESS
Revision: 2
CAL Version: 1.4.1332
Engine clock: 850 Mhz
Memory clock: 1200 Mhz
GPU RAM: 1024
Wavefront size: 64
Double precision: CAL_TRUE
Compute shader: CAL_TRUE
Number SIMD: 20
Number shader engines: 2
Pitch alignment: 256
Surface alignment: 4096
Max size 2D: { 16384, 16384 }

Estimated iteration time 125.418382 ms
Target frequency 30.000000 Hz, polling mode 1, using responsiveness factor of 1.000000
Dividing into 4 chunks
Integration range: { nu_steps = 640, mu_steps = 1600, r_steps = 1400 }
Using { 1, 4 } chunk(s) of size { 1400, 400 }
Integration time = 162.473890 s, average per iteration = 253.865454 ms
Integral 0 time = 166.723979 s
Estimated iteration time 31.354596 ms
Target frequency 30.000000 Hz, polling mode 1, using responsiveness factor of 1.000000
Dividing into 1 chunks
Integration range: { nu_steps = 640, mu_steps = 400, r_steps = 1400 }
Using { 1, 1 } chunk(s) of size { 1400, 400 }
Integration time = 39.717190 s, average per iteration = 62.058109 ms
Integral 1 time = 40.690802 s
Likelihood time = 13.151326 s
Non-finite result
Failed to calculate likelihood
<background_integral> 0.001066791568732 </background_integral>
<stream_integral> 197.666267930748210 200.845326878907000 -0.015171049569297 </stream_integral>
<background_likelihood> -3.098990078556063 </background_likelihood>
<stream_only_likelihood> -27.787052195283170 -6.582193972905476 -1.#IND00000000000 </stream_only_likelihood>
<search_likelihood> -1.#IND00000000000 </search_likelihood>
12:57:13 (4060): called boinc_finish

</stderr_txt>
]]>
ID: 48758 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 729,293,740
RAC: 0
Message 48767 - Posted: 14 May 2011, 19:17:30 UTC - in response to Message 48749.  

I'm going to shut down the "de_separation_10_3s_free_2..." runs. It may take a bit the remaining WUs to filter out of the system.

-Matthew

Thanks. Problem is, the "de_separation_17_3s_fix_5" WUs are worse. Not only do they fail but they sometimes run for hours, tying up the GPU :(
ID: 48767 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile dnolan
Avatar

Send message
Joined: 26 Oct 09
Posts: 55
Credit: 352,166,802
RAC: 0
Message 48768 - Posted: 14 May 2011, 19:29:23 UTC - in response to Message 48767.  

I'm going to shut down the "de_separation_10_3s_free_2..." runs. It may take a bit the remaining WUs to filter out of the system.

-Matthew

Thanks. Problem is, the "de_separation_17_3s_fix_5" WUs are worse. Not only do they fail but they sometimes run for hours, tying up the GPU :(


Just aborted one that had been running 1 hour 45 mins on one of my 5870s...

-Dave
ID: 48768 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ex_Brit
Avatar

Send message
Joined: 24 Jul 10
Posts: 21
Credit: 465,205
RAC: 0
Message 48781 - Posted: 15 May 2011, 13:32:04 UTC
Last modified: 15 May 2011, 14:22:24 UTC

I've had several fail the same way as previously mentioned here. Aborted the rest and, for now at least, it appears that de_separation ones have stopped downloading.

One's account only goes back 1 or two WU's I've noticed so it's difficult to check for a pattern.
Peter
Toronto, Canada
ID: 48781 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ex_Brit
Avatar

Send message
Joined: 24 Jul 10
Posts: 21
Credit: 465,205
RAC: 0
Message 48786 - Posted: 15 May 2011, 18:05:24 UTC - in response to Message 48781.  

This is ridiculous...now de_nbody_orphan_test_2model_4_50204_1305471400_1 failed and one before that. I've aborted another one since and am turning off the work fetch until someone sorts this mess out.
Peter
Toronto, Canada
ID: 48786 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Chris
Avatar

Send message
Joined: 3 Oct 10
Posts: 42
Credit: 320,242
RAC: 0
Message 49101 - Posted: 29 May 2011, 16:30:39 UTC

Where do you find these STDERR logs? I cant find them in the BOINC folder or in 'My Documents'
32bit Windows XP Home
AMD Opteron 180
ASUS A8N-SLI Motherboard
Nvidia 450GTS GPU
4GB DDR Memory
ID: 49101 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
FruehwF

Send message
Joined: 28 Feb 10
Posts: 120
Credit: 109,840,492
RAC: 0
Message 49118 - Posted: 30 May 2011, 8:54:58 UTC

You'll find it in the Boinc Data-Dir
For Win xp the standard is
C:\Documents and Settings\All Users\Application Data\BOINC
ID: 49118 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Zydor
Avatar

Send message
Joined: 24 Feb 09
Posts: 620
Credit: 100,587,625
RAC: 0
Message 49129 - Posted: 30 May 2011, 23:51:53 UTC - in response to Message 49101.  

Where do you find these STDERR logs? I cant find them in the BOINC folder or in 'My Documents'


For individual WU Stderr's go to: Account Page - click Computers - click Tasks [of the PC you are interested in] - look for the Task Column on the Top Left - go to the WU you are interested in - click the blue number - your there.

Stderr only shows for individual WUs when work has been done - or failed :)

Regards
Zy
ID: 49129 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Bad WUs

©2024 Astroinformatics Group