Message boards :
Number crunching :
Bad WUs
Message board moderation
Author | Message |
---|---|
![]() ![]() Send message Joined: 15 Jul 08 Posts: 383 Credit: 729,293,740 RAC: 0 ![]() ![]() |
In the last day I've been getting a number of "error while computing" WUs on all my machines. All other machines are failing these same WUs. Here's an example: http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=20534102 |
Send message Joined: 6 May 09 Posts: 217 Credit: 6,856,375 RAC: 0 ![]() ![]() |
Unfortunately the server clears out finished WU info quick. Can you post the name for these WU's? It should start with "de_separation" or "de_nbody." Thanks. -Matthew |
![]() Send message Joined: 28 Aug 09 Posts: 23 Credit: 1,263,508,642 RAC: 0 ![]() ![]() |
i just got two of them: de_separation_10_3s_free_2_420330_1305301089_1 de_separation_10_3s_free_2_423122_1305301520_0 the stderr log is the following: <core_client_version>6.10.60</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> <search_application> milkywayathome_client separation 0.62 Windows x86 double CAL++ </search_application> Found 1 CAL devices Chose device 0 Device target: CAL_TARGET_CAYMAN Revision: 1 CAL Version: 1.4.900 Engine clock: 810 Mhz Memory clock: 1100 Mhz GPU RAM: 2048 Wavefront size: 64 Double precision: CAL_TRUE Compute shader: CAL_TRUE Number SIMD: 22 Number shader engines: 2 Pitch alignment: 256 Surface alignment: 4096 Max size 2D: { 16384, 16384 } Estimated iteration time 119.647166 ms Target frequency 30.000000 Hz, polling mode 1, using responsiveness factor of 1.000000 Dividing into 4 chunks Integration range: { nu_steps = 640, mu_steps = 1600, r_steps = 1400 } Using { 1, 4 } chunk(s) of size { 1400, 400 } Integration time = 82.015017 s, average per iteration = 128.148464 ms Integral 0 time = 83.759557 s Estimated iteration time 29.911792 ms Target frequency 30.000000 Hz, polling mode 1, using responsiveness factor of 1.000000 Dividing into 1 chunks Integration range: { nu_steps = 640, mu_steps = 400, r_steps = 1400 } Using { 1, 1 } chunk(s) of size { 1400, 400 } Integration time = 20.501572 s, average per iteration = 32.033706 ms Integral 1 time = 20.980437 s Likelihood time = 6.569375 s Non-finite result Failed to calculate likelihood <background_integral> 0.000972832098789 </background_integral> <stream_integral> 27.656173175642195 1183.863218271929800 -0.138228459499025 </stream_integral> <background_likelihood> -2.983789565586555 </background_likelihood> <stream_only_likelihood> -58.146684859558270 -7.945487711078723 -1.#IND00000000000 </stream_only_likelihood> <search_likelihood> -1.#IND00000000000 </search_likelihood> 18:19:23 (2796): called boinc_finish </stderr_txt> |
![]() ![]() Send message Joined: 24 Dec 07 Posts: 1947 Credit: 240,884,648 RAC: 0 ![]() ![]() |
de_separation_10_3s_free_2_423096_1305301520 de_separation_10_3s_free_2_414045_1305300192 de_separation_13_3s_free_2_440995_1305303756 <-empty stderr output de_separation_13_3s_fix20_1_4605109_1304741455 (too many total results) |
![]() ![]() Send message Joined: 15 Jul 08 Posts: 383 Credit: 729,293,740 RAC: 0 ![]() ![]() |
Unfortunately the server clears out finished WU info quick. Can you post the name for these WU's? It should start with "de_separation" or "de_nbody." Thanks. There's getting to be a LOT of them: Here's a few: de_separation_10_3s_free_2_267219_1305277529 de_separation_10_3s_free_2_345488_1305289665 de_separation_10_3s_free_2_372388_1305293790 de_separation_10_3s_free_2_339020_1305288748 de_separation_10_3s_free_2_399297_1305297962 de_separation_10_3s_free_2_396418_1305297505 de_separation_10_3s_free_2_414226_1305300192 de_separation_10_3s_free_2_423450_1305301520 de_separation_10_3s_free_2_420075_1305301086 de_separation_10_3s_free_2_438158_1305303752 de_separation_10_3s_free_2_447027_1305305089 Some of the bad WUs I've gotten on several machines, all failing. They fail on every machine they're sent to. |
![]() ![]() Send message Joined: 15 Jul 08 Posts: 383 Credit: 729,293,740 RAC: 0 ![]() ![]() |
More bad ones: de_separation_10_3s_free_2_432480_1305302870 de_separation_17_3s_fix_5_46_1305223043 de_separation_17_3s_fix_5_142_1305223043 de_separation_10_3s_free_2_414220_1305300192 de_separation_10_3s_free_2_453301_1305305955 de_separation_10_3s_free_2_453153_1305305955 de_separation_10_3s_free_2_459484_1305306841 de_separation_10_3s_free_2_438360_1305303752 de_separation_10_3s_free_2_447410_1305305089 de_separation_10_3s_free_2_441307_1305304196 de_separation_10_3s_free_2_396418_1305297505 de_separation_10_3s_free_2_438470_1305303752 de_separation_10_3s_free_2_438158_1305303752 de_separation_10_3s_free_2_468444_1305308188 de_separation_10_3s_free_2_465352_1305307741 |
Send message Joined: 19 Feb 09 Posts: 32 Credit: 32,843,308 RAC: 0 ![]() ![]() |
A few from me (same error as mentioned above): de_separation_10_3s_free_2_453478_1305305955_1 de_separation_10_3s_free_2_462196_1305307281_0 de_separation_10_3s_free_2_420164_1305301086_1 Error example: <core_client_version>6.12.26</core_client_version> <![CDATA[ <message> Unzul�ssige Funktion. (0x1) - exit code 1 (0x1) </message> <stderr_txt> <search_application> milkywayathome_client separation 0.62 Windows x86 double CAL++ </search_application> Found 1 CAL devices Chose device 0 Device target: CAL_TARGET_770 Revision: 2 CAL Version: 1.4.1385 Engine clock: 750 Mhz Memory clock: 900 Mhz GPU RAM: 1024 Wavefront size: 64 Double precision: CAL_TRUE Compute shader: CAL_TRUE Number SIMD: 10 Number shader engines: 1 Pitch alignment: 256 Surface alignment: 4096 Max size 2D: { 8192, 8192 } Estimated iteration time 284.281667 ms Target frequency 40.000000 Hz, polling mode 1, using responsiveness factor of 1.000000 Dividing into 16 chunks Integration range: { nu_steps = 640, mu_steps = 1600, r_steps = 1400 } Using { 1, 16 } chunk(s) of size { 1400, 100 } Integration time = 206.497489 s, average per iteration = 322.652326 ms Integral 0 time = 208.028925 s Estimated iteration time 71.070417 ms Target frequency 40.000000 Hz, polling mode 1, using responsiveness factor of 1.000000 Dividing into 2 chunks Integration range: { nu_steps = 640, mu_steps = 400, r_steps = 1400 } Using { 1, 2 } chunk(s) of size { 1400, 200 } Integration time = 51.262623 s, average per iteration = 80.097848 ms Integral 1 time = 51.766143 s Likelihood time = 8.690186 s Non-finite result Failed to calculate likelihood <background_integral> 0.001134262006206 </background_integral> <stream_integral> 24.680924661083985 1331.077010865039700 -0.317058930291751 </stream_integral> <background_likelihood> -3.043218251293072 </background_likelihood> <stream_only_likelihood> -10.208126603949683 -8.771491003836179 -1.#IND00000000000 </stream_only_likelihood> <search_likelihood> -1.#IND00000000000 </search_likelihood> 19:26:59 (3724): called boinc_finish </stderr_txt> ]]> Starfire ![]() ![]() |
![]() ![]() Send message Joined: 15 Jul 08 Posts: 383 Credit: 729,293,740 RAC: 0 ![]() ![]() |
Even more: de_separation_10_3s_free_2_456101_1305306387 de_separation_10_3s_free_2_486165_1305310853 de_separation_10_3s_free_2_480196_1305309952 de_separation_10_3s_free_2_471204_1305308645 de_separation_10_3s_free_2_474195_1305309078 de_separation_10_3s_free_2_453159_1305305955 de_separation_10_3s_free_2_486384_1305310854 de_separation_10_3s_free_2_474195_1305309078 de_separation_10_3s_free_2_495316_1305312172 de_separation_10_3s_free_2_429487_1305302425 de_separation_10_3s_free_2_495368_1305312172 de_separation_10_3s_free_2_501487_1305313065 de_separation_10_3s_free_2_501086_1305313065 de_separation_10_3s_free_2_498477_1305312620 de_separation_10_3s_free_2_498218_1305312619 de_separation_10_3s_free_2_486495_1305310854 de_separation_10_3s_free_2_462254_1305307281 de_separation_10_3s_free_2_483065_1305310394 de_separation_10_3s_free_2_489441_1305311299 de_separation_10_3s_free_2_477137_1305309511 de_separation_10_3s_free_2_477479_1305309511 de_separation_10_3s_free_2_486494_1305310854 It's getting worse... |
![]() ![]() Send message Joined: 16 Dec 10 Posts: 46 Credit: 205,697,511 RAC: 0 ![]() ![]() |
I've also some error WU's... ![]() |
Send message Joined: 6 May 09 Posts: 217 Credit: 6,856,375 RAC: 0 ![]() ![]() |
I'm going to shut down the "de_separation_10_3s_free_2..." runs. It may take a bit the remaining WUs to filter out of the system. -Matthew |
![]() ![]() Send message Joined: 16 Dec 10 Posts: 46 Credit: 205,697,511 RAC: 0 ![]() ![]() |
Fastest service in the BOINC-cosmos THX !!!! ![]() |
![]() ![]() Send message Joined: 15 Jul 08 Posts: 383 Credit: 729,293,740 RAC: 0 ![]() ![]() |
I'm going to shut down the "de_separation_10_3s_free_2..." runs. It may take a bit the remaining WUs to filter out of the system. Yep the bad WUs are still coming. So far have had 93+ of these error out today. Unfortunately they error at the end and a few have even "gotten stuck" and run for hours instead of the usual few minutes. |
![]() Send message Joined: 18 Jul 09 Posts: 300 Credit: 303,562,776 RAC: 1 ![]() ![]() |
Me too... de_separation_10_3s_free_2_465180_1305307741 Stderr output <core_client_version>6.10.60</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> <search_application> milkywayathome_client separation 0.62 Windows x86 double CAL++ </search_application> Found 1 CAL devices Chose device 0 Device target: CAL_TARGET_CYPRESS Revision: 2 CAL Version: 1.4.1332 Engine clock: 900 Mhz Memory clock: 900 Mhz GPU RAM: 1024 Wavefront size: 64 Double precision: CAL_TRUE Compute shader: CAL_TRUE Number SIMD: 20 Number shader engines: 2 Pitch alignment: 256 Surface alignment: 4096 Max size 2D: { 16384, 16384 } Estimated iteration time 118.450694 ms Target frequency 30.000000 Hz, polling mode 1, using responsiveness factor of 1.000000 Dividing into 4 chunks Integration range: { nu_steps = 640, mu_steps = 1600, r_steps = 1400 } Using { 1, 4 } chunk(s) of size { 1400, 400 } <search_application> milkywayathome_client separation 0.62 Windows x86 double CAL++ </search_application> Found 1 CAL devices Chose device 0 Device target: CAL_TARGET_CYPRESS Revision: 2 CAL Version: 1.4.1332 Engine clock: 850 Mhz Memory clock: 1200 Mhz GPU RAM: 1024 Wavefront size: 64 Double precision: CAL_TRUE Compute shader: CAL_TRUE Number SIMD: 20 Number shader engines: 2 Pitch alignment: 256 Surface alignment: 4096 Max size 2D: { 16384, 16384 } Estimated iteration time 125.418382 ms Target frequency 30.000000 Hz, polling mode 1, using responsiveness factor of 1.000000 Dividing into 4 chunks Integration range: { nu_steps = 640, mu_steps = 1600, r_steps = 1400 } Using { 1, 4 } chunk(s) of size { 1400, 400 } Integration time = 162.473890 s, average per iteration = 253.865454 ms Integral 0 time = 166.723979 s Estimated iteration time 31.354596 ms Target frequency 30.000000 Hz, polling mode 1, using responsiveness factor of 1.000000 Dividing into 1 chunks Integration range: { nu_steps = 640, mu_steps = 400, r_steps = 1400 } Using { 1, 1 } chunk(s) of size { 1400, 400 } Integration time = 39.717190 s, average per iteration = 62.058109 ms Integral 1 time = 40.690802 s Likelihood time = 13.151326 s Non-finite result Failed to calculate likelihood <background_integral> 0.001066791568732 </background_integral> <stream_integral> 197.666267930748210 200.845326878907000 -0.015171049569297 </stream_integral> <background_likelihood> -3.098990078556063 </background_likelihood> <stream_only_likelihood> -27.787052195283170 -6.582193972905476 -1.#IND00000000000 </stream_only_likelihood> <search_likelihood> -1.#IND00000000000 </search_likelihood> 12:57:13 (4060): called boinc_finish </stderr_txt> ]]> |
![]() ![]() Send message Joined: 15 Jul 08 Posts: 383 Credit: 729,293,740 RAC: 0 ![]() ![]() |
I'm going to shut down the "de_separation_10_3s_free_2..." runs. It may take a bit the remaining WUs to filter out of the system. Thanks. Problem is, the "de_separation_17_3s_fix_5" WUs are worse. Not only do they fail but they sometimes run for hours, tying up the GPU :( |
![]() ![]() Send message Joined: 26 Oct 09 Posts: 55 Credit: 352,166,802 RAC: 0 ![]() ![]() |
I'm going to shut down the "de_separation_10_3s_free_2..." runs. It may take a bit the remaining WUs to filter out of the system. Just aborted one that had been running 1 hour 45 mins on one of my 5870s... -Dave |
![]() ![]() Send message Joined: 24 Jul 10 Posts: 21 Credit: 465,205 RAC: 0 ![]() ![]() |
I've had several fail the same way as previously mentioned here. Aborted the rest and, for now at least, it appears that de_separation ones have stopped downloading. One's account only goes back 1 or two WU's I've noticed so it's difficult to check for a pattern. Peter Toronto, Canada |
![]() ![]() Send message Joined: 24 Jul 10 Posts: 21 Credit: 465,205 RAC: 0 ![]() ![]() |
This is ridiculous...now de_nbody_orphan_test_2model_4_50204_1305471400_1 failed and one before that. I've aborted another one since and am turning off the work fetch until someone sorts this mess out. Peter Toronto, Canada |
![]() Send message Joined: 3 Oct 10 Posts: 42 Credit: 320,242 RAC: 0 ![]() ![]() |
Where do you find these STDERR logs? I cant find them in the BOINC folder or in 'My Documents' 32bit Windows XP Home AMD Opteron 180 ASUS A8N-SLI Motherboard Nvidia 450GTS GPU 4GB DDR Memory |
Send message Joined: 28 Feb 10 Posts: 120 Credit: 109,840,492 RAC: 0 ![]() ![]() |
You'll find it in the Boinc Data-Dir For Win xp the standard is C:\Documents and Settings\All Users\Application Data\BOINC |
![]() Send message Joined: 24 Feb 09 Posts: 620 Credit: 100,587,625 RAC: 0 ![]() ![]() |
Where do you find these STDERR logs? I cant find them in the BOINC folder or in 'My Documents' For individual WU Stderr's go to: Account Page - click Computers - click Tasks [of the PC you are interested in] - look for the Task Column on the Top Left - go to the WU you are interested in - click the blue number - your there. Stderr only shows for individual WUs when work has been done - or failed :) Regards Zy |
©2023 Astroinformatics Group