WU restart every 20%
log in

Advanced search

Message boards : Number crunching : WU restart every 20%

Author Message
Profile Michael Yusko
Avatar
Send message
Joined: 27 May 11
Posts: 22
Credit: 51,505,849
RAC: 33,967

Message 65986 - Posted: 6 Dec 2016, 17:24:40 UTC

I start a WU and it will eventually run to completion, but only after it runs to 20%, restarts, runs to 40%, restarts, runs to 60%, restarts, runs to 80%, restarts, then finally runs to 100% and completes. See log below for an example WU. It's exactly 20.00%, 40.00%, etc too. Thought's? It's slowing me down dramatically.

12/6/2016 12:09:51 PM | | Starting BOINC client version 7.6.22 for windows_x86_64 12/6/2016 12:09:51 PM | | log flags: file_xfer, sched_ops, task 12/6/2016 12:09:51 PM | | Libraries: libcurl/7.45.0 OpenSSL/1.0.2d zlib/1.2.8 12/6/2016 12:09:51 PM | | Data directory: D:\ProgramData\BOINC 12/6/2016 12:09:51 PM | | Running under account mpyusko 12/6/2016 12:09:51 PM | | OpenCL: AMD/ATI GPU 0: Ellesmere (driver version 2117.14 (VM), device version OpenCL 2.0 AMD-APP (2117.14), 8192MB, 8192MB available, 3865 GFLOPS peak) 12/6/2016 12:09:51 PM | | OpenCL CPU: AMD FX(tm)-8350 Eight-Core Processor (OpenCL driver vendor: Advanced Micro Devices, Inc., driver version 2117.14 (sse2,avx,fma4), device version OpenCL 1.2 AMD-APP (2117.14)) 12/6/2016 12:09:52 PM | | Host name: gibson 12/6/2016 12:09:52 PM | | Processor: 8 AuthenticAMD AMD FX(tm)-8350 Eight-Core Processor [Family 21 Model 2 Stepping 0] 12/6/2016 12:09:52 PM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni ssse3 fma cx16 sse4_1 sse4_2 popcnt aes f16c syscall nx lm avx svm sse4a osvw ibs xop skinit wdt lwp fma4 tce tbm topx page1gb rdtscp bmi1 12/6/2016 12:09:52 PM | | OS: Microsoft Windows 10: Professional x64 Edition, (10.00.14393.00) 12/6/2016 12:09:52 PM | | Memory: 15.90 GB physical, 31.90 GB virtual 12/6/2016 12:09:52 PM | | Disk: 931.51 GB total, 374.99 GB free 12/6/2016 12:09:52 PM | | Local time is UTC -5 hours 12/6/2016 12:09:52 PM | | Config: don't compute while hl2.exe is running 12/6/2016 12:09:52 PM | | Config: don't compute while Rage.exe is running 12/6/2016 12:09:52 PM | | Config: don't compute while Rage64.exe is running 12/6/2016 12:09:52 PM | | Config: don't compute while Ryse.exe is running 12/6/2016 12:09:52 PM | | Config: don't use GPUs while Rage.exe is running 12/6/2016 12:09:52 PM | | Config: don't use GPUs while Rage64.exe is running 12/6/2016 12:09:52 PM | Milkyway@Home | URL http://milkyway.cs.rpi.edu/milkyway/; Computer ID 704050; resource share 100 12/6/2016 12:11:08 PM | Milkyway@Home | Starting task de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3_2_1480516808_2062060_1 12/6/2016 12:11:29 PM | Milkyway@Home | Sending scheduler request: To fetch work. 12/6/2016 12:11:29 PM | Milkyway@Home | Reporting 1 completed tasks 12/6/2016 12:11:29 PM | Milkyway@Home | Requesting new tasks for AMD/ATI GPU 12/6/2016 12:11:30 PM | Milkyway@Home | Scheduler request completed: got 13 new tasks 12/6/2016 12:12:41 PM | Milkyway@Home | Message from task: 0 12/6/2016 12:12:41 PM | Milkyway@Home | Computation for task de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3_2_1480516808_2062060_1 finished

____________
-mpyusko

AMD FX-8350 @ 4.3GHz
AMD Radeon RX 480 8GB @ 1342MHz/2000MHz

bluestang
Send message
Joined: 13 Oct 16
Posts: 39
Credit: 123,726,595
RAC: 1,144,921

Message 65987 - Posted: 6 Dec 2016, 20:13:57 UTC - in response to Message 65986.

Jake bundled the WU to include 5 tasks to take some load off the server and keep the crunchers fed without interruption.

See this thread...
http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4052

Jesse Viviano
Send message
Joined: 4 Feb 11
Posts: 82
Credit: 35,684,527
RAC: 12,905

Message 65988 - Posted: 7 Dec 2016, 3:44:51 UTC - in response to Message 65986.

There is a bug in how percentages are calculated by the MilkyWay@home application. As for why your work units are slower, the newer work units bundle 5 of the older work units into one bigger work unit to reduce BOINC server load because double-precision capable GPUs run these tasks much faster than CPUs, causing GPU crunchers to overwhelm the BOINC server.

Profile Michael Yusko
Avatar
Send message
Joined: 27 May 11
Posts: 22
Credit: 51,505,849
RAC: 33,967

Message 65989 - Posted: 7 Dec 2016, 4:43:56 UTC

Ah, this makes sense. I just looked a little closer and the timer counts up appropriately, but the countdown changes according to the restarts. A bit confusing.

My RX 480 burns through these things at 1:32 each. By contrast my HD 7770 on another machine does them in 5:47 each.

Thanks.
____________
-mpyusko

AMD FX-8350 @ 4.3GHz
AMD Radeon RX 480 8GB @ 1342MHz/2000MHz

IKI
Send message
Joined: 17 Sep 13
Posts: 12
Credit: 289,071,486
RAC: 3,142,747

Message 67065 - Posted: 11 Feb 2018, 16:40:04 UTC

I too started to get this behavior suddenly a few days ago.
No idea why. The computer was left alone..(Win10, AMD Ryzen 1700X, Firepro S9150 GPU)

Before: 4 MW tasks in parallel, 95 sec GPU time and 25 sec CPU time each
Now: (if left to 4 tasks simultaneously) 500 sec GPU time and 170 sec CPU time.

Behavior:
GPU is at 0 load and just idles for a while at 0% (or 20, 40, 60 and 80%) ; then actually starts to work and sprints to the next 1/5 step. Rinse and repeat..
CPU seems not to do much anyway.

I tried a reinstall of the Firepro S9150 drivers. No changes.
Restart of the project. No changes.

Any ideas??

Here an extract of a typical WU:
<core_client_version>7.8.3</core_client_version>
<![CDATA[
<stderr_txt>
<search_application> milkyway_separation 1.46 Windows x86 double OpenCL </search_application>
Reading preferences ended prematurely
BOINC GPU type suggests using OpenCL vendor 'Advanced Micro Devices, Inc.'
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Switching to Parameter File 'astronomy_parameters.txt'
<number_WUs> 5 </number_WUs>
<number_params_per_WU> 20 </number_params_per_WU>
Using SSE4.1 path
Found 2 platforms
Platform 0 information:
Name: NVIDIA CUDA
Version: OpenCL 1.2 CUDA 9.1.75
Vendor: NVIDIA Corporation
Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_khr_gl_event cl_nv_create_buffer
Profile: FULL_PROFILE
Platform 1 information:
Name: AMD Accelerated Parallel Processing
Version: OpenCL 2.0 AMD-APP (1800.12)
Vendor: Advanced Micro Devices, Inc.
Extensions: cl_khr_icd cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_amd_event_callback cl_amd_offline_devices
Profile: FULL_PROFILE
Using device 1 on platform 1
Found 2 CL devices
Device 'Hawaii' (Advanced Micro Devices, Inc.:0x1002) (CL_DEVICE_TYPE_GPU)
Board: AMD FirePro S9150 (FireGL V)
Driver version: 1800.12 (VM)
Version: OpenCL 1.2 AMD-APP (1800.12)
Compute capability: 0.0
Max compute units: 44
Clock frequency: 900 Mhz
Global mem size: 3221225472
Local mem size: 32768
Max const buf size: 65536
Double extension: cl_khr_fp64
<search_application> milkyway_separation 1.46 Windows x86 double OpenCL </search_application>
Reading preferences ended prematurely
BOINC GPU type suggests using OpenCL vendor 'Advanced Micro Devices, Inc.'
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Switching to Parameter File 'astronomy_parameters.txt'
<number_WUs> 5 </number_WUs>
<number_params_per_WU> 20 </number_params_per_WU>
Using SSE4.1 path
<search_application> milkyway_separation 1.46 Windows x86 double OpenCL </search_application>
Reading preferences ended prematurely
BOINC GPU type suggests using OpenCL vendor 'Advanced Micro Devices, Inc.'
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Switching to Parameter File 'astronomy_parameters.txt'
<number_WUs> 5 </number_WUs>
<number_params_per_WU> 20 </number_params_per_WU>
Using SSE4.1 path
<search_application> milkyway_separation 1.46 Windows x86 double OpenCL </search_application>
Reading preferences ended prematurely
BOINC GPU type suggests using OpenCL vendor 'Advanced Micro Devices, Inc.'
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Switching to Parameter File 'astronomy_parameters.txt'
<number_WUs> 5 </number_WUs>
<number_params_per_WU> 20 </number_params_per_WU>
Using SSE4.1 path
Found 2 platforms
Platform 0 information:
Name: NVIDIA CUDA
Version: OpenCL 1.2 CUDA 9.1.75
Vendor: NVIDIA Corporation
Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_khr_gl_event cl_nv_create_buffer
Profile: FULL_PROFILE
Platform 1 information:
Name: AMD Accelerated Parallel Processing
Version: OpenCL 2.0 AMD-APP (1800.12)
Vendor: Advanced Micro Devices, Inc.
Extensions: cl_khr_icd cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_amd_event_callback cl_amd_offline_devices
Profile: FULL_PROFILE
Using device 1 on platform 1
Found 2 CL devices
Device 'Hawaii' (Advanced Micro Devices, Inc.:0x1002) (CL_DEVICE_TYPE_GPU)
Board: AMD FirePro S9150 (FireGL V)
Driver version: 1800.12 (VM)
Version: OpenCL 1.2 AMD-APP (1800.12)
Compute capability: 0.0
Max compute units: 44
Clock frequency: 900 Mhz
Global mem size: 3221225472
Local mem size: 32768
Max const buf size: 65536
Double extension: cl_khr_fp64
Estimated AMD GPU GFLOP/s: 396 SP GFLOP/s, 79 DP FLOP/s
Warning: Bizarrely low flops (79). Defaulting to 100
Using a target frequency of 60.0
Using a block size of 11264 with 4 blocks/chunk
Using clWaitForEvents() for polling (mode -1)
Range: { nu_steps = 320, mu_steps = 800, r_steps = 700 }
Iteration area: 560000
Chunk estimate: 11
Num chunks: 13
Chunk size: 45056
Added area: 25728
Effective area: 585728
Initial wait: 13 ms
Integration time: 18.623291 s. Average time per iteration = 58.197786 ms
Integral 0 time = 19.168719 s
Running likelihood with 84044 stars
Likelihood time = 2.110552 s
<background_integral> 0.000117178046727 </background_integral>
<stream_integral> 4.561654032970371 261.985944750789370 65.464078437813157 </stream_integral>
<background_likelihood> -3.387460651512970 </background_likelihood>
<stream_only_likelihood> -123.098837657635800 -3.955471871042226 -3.762280414084465 </stream_only_likelihood>
<search_likelihood> -2.973190267938498 </search_likelihood>
Using SSE4.1 path

IKI
Send message
Joined: 17 Sep 13
Posts: 12
Credit: 289,071,486
RAC: 3,142,747

Message 67067 - Posted: 11 Feb 2018, 17:38:49 UTC

As a small update:
Looking closely at the output of my old normal WU and these new, weird behaved ones I see one difference:

This line: "BOINC GPU type suggests using OpenCL vendor 'Advanced Micro Devices, Inc.'" is nowhere to be found in the old good WU.

Also there is a bunch more of these:
"BOINC GPU type suggests using OpenCL vendor 'Advanced Micro Devices, Inc.'
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Switching to Parameter File 'astronomy_parameters.txt'
<number_WUs> 5 </number_WUs>
<number_params_per_WU> 20 </number_params_per_WU>
Using SSE4.1 path"
In the old WU this was only there at the start.

Not sure if that hints at anything though...

mmonnin
Send message
Joined: 2 Oct 16
Posts: 99
Credit: 79,301,187
RAC: 105

Message 67069 - Posted: 12 Feb 2018, 2:31:32 UTC

Nothings wrong. See the 2nd post in the thread.

IKI
Send message
Joined: 17 Sep 13
Posts: 12
Credit: 289,071,486
RAC: 3,142,747

Message 67070 - Posted: 12 Feb 2018, 4:48:52 UTC - in response to Message 67069.

Thanks but I saw it and the situation seems different this time.
First as I said the GPU sits idle most of the time, waiting for something apparently at these 1/5 increments.
Second the tasks I had few days ago were already bundle5 tasks. And were, as you would expect, loading the gpu to near 100%.

IKI
Send message
Joined: 17 Sep 13
Posts: 12
Credit: 289,071,486
RAC: 3,142,747

Message 67071 - Posted: 12 Feb 2018, 15:15:05 UTC - in response to Message 67070.

Ok, things are back to normal. GPU at 100% load. WU being processed in the usual time.

For future reference here what seems to have solved it for now:
-disinstalled the GPU drivers
-used the AMD clean up utility
-reinstalled the drivers but this time an other version:15.301.2601.1002-whql-firepro-windows-retail.exe

Let's hope that this time it won't revert by itself to this strange abnormal behavior...

mikey
Avatar
Send message
Joined: 8 May 09
Posts: 2160
Credit: 206,569,545
RAC: 171,474

Message 67108 - Posted: 19 Feb 2018, 16:38:03 UTC - in response to Message 67071.

Ok, things are back to normal. GPU at 100% load. WU being processed in the usual time.

For future reference here what seems to have solved it for now:
-disinstalled the GPU drivers
-used the AMD clean up utility
-reinstalled the drivers but this time an other version:15.301.2601.1002-whql-firepro-windows-retail.exe

Let's hope that this time it won't revert by itself to this strange abnormal behavior...


If it's Win10 be aware of Wiundows updates, they often do things like install drivers they like that aren't always good for us.


Post to thread

Message boards : Number crunching : WU restart every 20%


Main page · Your account · Message boards


Copyright © 2018 AstroInformatics Group