Message boards :
Number crunching :
WU restart every 20%
Message board moderation
Author | Message |
---|---|
Send message Joined: 27 May 11 Posts: 28 Credit: 209,380,724 RAC: 0 |
I start a WU and it will eventually run to completion, but only after it runs to 20%, restarts, runs to 40%, restarts, runs to 60%, restarts, runs to 80%, restarts, then finally runs to 100% and completes. See log below for an example WU. It's exactly 20.00%, 40.00%, etc too. Thought's? It's slowing me down dramatically. 12/6/2016 12:09:51 PM | | Starting BOINC client version 7.6.22 for windows_x86_64 12/6/2016 12:09:51 PM | | log flags: file_xfer, sched_ops, task 12/6/2016 12:09:51 PM | | Libraries: libcurl/7.45.0 OpenSSL/1.0.2d zlib/1.2.8 12/6/2016 12:09:51 PM | | Data directory: D:\ProgramData\BOINC 12/6/2016 12:09:51 PM | | Running under account mpyusko 12/6/2016 12:09:51 PM | | OpenCL: AMD/ATI GPU 0: Ellesmere (driver version 2117.14 (VM), device version OpenCL 2.0 AMD-APP (2117.14), 8192MB, 8192MB available, 3865 GFLOPS peak) 12/6/2016 12:09:51 PM | | OpenCL CPU: AMD FX(tm)-8350 Eight-Core Processor (OpenCL driver vendor: Advanced Micro Devices, Inc., driver version 2117.14 (sse2,avx,fma4), device version OpenCL 1.2 AMD-APP (2117.14)) 12/6/2016 12:09:52 PM | | Host name: gibson 12/6/2016 12:09:52 PM | | Processor: 8 AuthenticAMD AMD FX(tm)-8350 Eight-Core Processor [Family 21 Model 2 Stepping 0] 12/6/2016 12:09:52 PM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni ssse3 fma cx16 sse4_1 sse4_2 popcnt aes f16c syscall nx lm avx svm sse4a osvw ibs xop skinit wdt lwp fma4 tce tbm topx page1gb rdtscp bmi1 12/6/2016 12:09:52 PM | | OS: Microsoft Windows 10: Professional x64 Edition, (10.00.14393.00) 12/6/2016 12:09:52 PM | | Memory: 15.90 GB physical, 31.90 GB virtual 12/6/2016 12:09:52 PM | | Disk: 931.51 GB total, 374.99 GB free 12/6/2016 12:09:52 PM | | Local time is UTC -5 hours 12/6/2016 12:09:52 PM | | Config: don't compute while hl2.exe is running 12/6/2016 12:09:52 PM | | Config: don't compute while Rage.exe is running 12/6/2016 12:09:52 PM | | Config: don't compute while Rage64.exe is running 12/6/2016 12:09:52 PM | | Config: don't compute while Ryse.exe is running 12/6/2016 12:09:52 PM | | Config: don't use GPUs while Rage.exe is running 12/6/2016 12:09:52 PM | | Config: don't use GPUs while Rage64.exe is running 12/6/2016 12:09:52 PM | Milkyway@Home | URL http://milkyway.cs.rpi.edu/milkyway/; Computer ID 704050; resource share 100 12/6/2016 12:11:08 PM | Milkyway@Home | Starting task de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3_2_1480516808_2062060_1 12/6/2016 12:11:29 PM | Milkyway@Home | Sending scheduler request: To fetch work. 12/6/2016 12:11:29 PM | Milkyway@Home | Reporting 1 completed tasks 12/6/2016 12:11:29 PM | Milkyway@Home | Requesting new tasks for AMD/ATI GPU 12/6/2016 12:11:30 PM | Milkyway@Home | Scheduler request completed: got 13 new tasks 12/6/2016 12:12:41 PM | Milkyway@Home | Message from task: 0 12/6/2016 12:12:41 PM | Milkyway@Home | Computation for task de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3_2_1480516808_2062060_1 finished -mpyusko AMD FX-8350 @ 4.3GHz AMD Radeon RX 480 8GB @ 1342MHz/2000MHz |
Send message Joined: 13 Oct 16 Posts: 112 Credit: 1,174,293,644 RAC: 0 |
Jake bundled the WU to include 5 tasks to take some load off the server and keep the crunchers fed without interruption. See this thread... http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4052 |
Send message Joined: 4 Feb 11 Posts: 86 Credit: 60,913,150 RAC: 0 |
There is a bug in how percentages are calculated by the MilkyWay@home application. As for why your work units are slower, the newer work units bundle 5 of the older work units into one bigger work unit to reduce BOINC server load because double-precision capable GPUs run these tasks much faster than CPUs, causing GPU crunchers to overwhelm the BOINC server. |
Send message Joined: 27 May 11 Posts: 28 Credit: 209,380,724 RAC: 0 |
Ah, this makes sense. I just looked a little closer and the timer counts up appropriately, but the countdown changes according to the restarts. A bit confusing. My RX 480 burns through these things at 1:32 each. By contrast my HD 7770 on another machine does them in 5:47 each. Thanks. -mpyusko AMD FX-8350 @ 4.3GHz AMD Radeon RX 480 8GB @ 1342MHz/2000MHz |
Send message Joined: 17 Sep 13 Posts: 12 Credit: 2,706,117,276 RAC: 0 |
I too started to get this behavior suddenly a few days ago. No idea why. The computer was left alone..(Win10, AMD Ryzen 1700X, Firepro S9150 GPU) Before: 4 MW tasks in parallel, 95 sec GPU time and 25 sec CPU time each Now: (if left to 4 tasks simultaneously) 500 sec GPU time and 170 sec CPU time. Behavior: GPU is at 0 load and just idles for a while at 0% (or 20, 40, 60 and 80%) ; then actually starts to work and sprints to the next 1/5 step. Rinse and repeat.. CPU seems not to do much anyway. I tried a reinstall of the Firepro S9150 drivers. No changes. Restart of the project. No changes. Any ideas?? Here an extract of a typical WU: <core_client_version>7.8.3</core_client_version> <![CDATA[ <stderr_txt> <search_application> milkyway_separation 1.46 Windows x86 double OpenCL </search_application> Reading preferences ended prematurely BOINC GPU type suggests using OpenCL vendor 'Advanced Micro Devices, Inc.' Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4' Switching to Parameter File 'astronomy_parameters.txt' <number_WUs> 5 </number_WUs> <number_params_per_WU> 20 </number_params_per_WU> Using SSE4.1 path Found 2 platforms Platform 0 information: Name: NVIDIA CUDA Version: OpenCL 1.2 CUDA 9.1.75 Vendor: NVIDIA Corporation Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_khr_gl_event cl_nv_create_buffer Profile: FULL_PROFILE Platform 1 information: Name: AMD Accelerated Parallel Processing Version: OpenCL 2.0 AMD-APP (1800.12) Vendor: Advanced Micro Devices, Inc. Extensions: cl_khr_icd cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_amd_event_callback cl_amd_offline_devices Profile: FULL_PROFILE Using device 1 on platform 1 Found 2 CL devices Device 'Hawaii' (Advanced Micro Devices, Inc.:0x1002) (CL_DEVICE_TYPE_GPU) Board: AMD FirePro S9150 (FireGL V) Driver version: 1800.12 (VM) Version: OpenCL 1.2 AMD-APP (1800.12) Compute capability: 0.0 Max compute units: 44 Clock frequency: 900 Mhz Global mem size: 3221225472 Local mem size: 32768 Max const buf size: 65536 Double extension: cl_khr_fp64 <search_application> milkyway_separation 1.46 Windows x86 double OpenCL </search_application> Reading preferences ended prematurely BOINC GPU type suggests using OpenCL vendor 'Advanced Micro Devices, Inc.' Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4' Switching to Parameter File 'astronomy_parameters.txt' <number_WUs> 5 </number_WUs> <number_params_per_WU> 20 </number_params_per_WU> Using SSE4.1 path <search_application> milkyway_separation 1.46 Windows x86 double OpenCL </search_application> Reading preferences ended prematurely BOINC GPU type suggests using OpenCL vendor 'Advanced Micro Devices, Inc.' Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4' Switching to Parameter File 'astronomy_parameters.txt' <number_WUs> 5 </number_WUs> <number_params_per_WU> 20 </number_params_per_WU> Using SSE4.1 path <search_application> milkyway_separation 1.46 Windows x86 double OpenCL </search_application> Reading preferences ended prematurely BOINC GPU type suggests using OpenCL vendor 'Advanced Micro Devices, Inc.' Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4' Switching to Parameter File 'astronomy_parameters.txt' <number_WUs> 5 </number_WUs> <number_params_per_WU> 20 </number_params_per_WU> Using SSE4.1 path Found 2 platforms Platform 0 information: Name: NVIDIA CUDA Version: OpenCL 1.2 CUDA 9.1.75 Vendor: NVIDIA Corporation Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_khr_gl_event cl_nv_create_buffer Profile: FULL_PROFILE Platform 1 information: Name: AMD Accelerated Parallel Processing Version: OpenCL 2.0 AMD-APP (1800.12) Vendor: Advanced Micro Devices, Inc. Extensions: cl_khr_icd cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_amd_event_callback cl_amd_offline_devices Profile: FULL_PROFILE Using device 1 on platform 1 Found 2 CL devices Device 'Hawaii' (Advanced Micro Devices, Inc.:0x1002) (CL_DEVICE_TYPE_GPU) Board: AMD FirePro S9150 (FireGL V) Driver version: 1800.12 (VM) Version: OpenCL 1.2 AMD-APP (1800.12) Compute capability: 0.0 Max compute units: 44 Clock frequency: 900 Mhz Global mem size: 3221225472 Local mem size: 32768 Max const buf size: 65536 Double extension: cl_khr_fp64 Estimated AMD GPU GFLOP/s: 396 SP GFLOP/s, 79 DP FLOP/s Warning: Bizarrely low flops (79). Defaulting to 100 Using a target frequency of 60.0 Using a block size of 11264 with 4 blocks/chunk Using clWaitForEvents() for polling (mode -1) Range: { nu_steps = 320, mu_steps = 800, r_steps = 700 } Iteration area: 560000 Chunk estimate: 11 Num chunks: 13 Chunk size: 45056 Added area: 25728 Effective area: 585728 Initial wait: 13 ms Integration time: 18.623291 s. Average time per iteration = 58.197786 ms Integral 0 time = 19.168719 s Running likelihood with 84044 stars Likelihood time = 2.110552 s <background_integral> 0.000117178046727 </background_integral> <stream_integral> 4.561654032970371 261.985944750789370 65.464078437813157 </stream_integral> <background_likelihood> -3.387460651512970 </background_likelihood> <stream_only_likelihood> -123.098837657635800 -3.955471871042226 -3.762280414084465 </stream_only_likelihood> <search_likelihood> -2.973190267938498 </search_likelihood> Using SSE4.1 path |
Send message Joined: 17 Sep 13 Posts: 12 Credit: 2,706,117,276 RAC: 0 |
As a small update: Looking closely at the output of my old normal WU and these new, weird behaved ones I see one difference: This line: "BOINC GPU type suggests using OpenCL vendor 'Advanced Micro Devices, Inc.'" is nowhere to be found in the old good WU. Also there is a bunch more of these: "BOINC GPU type suggests using OpenCL vendor 'Advanced Micro Devices, Inc.' Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4' Switching to Parameter File 'astronomy_parameters.txt' <number_WUs> 5 </number_WUs> <number_params_per_WU> 20 </number_params_per_WU> Using SSE4.1 path" In the old WU this was only there at the start. Not sure if that hints at anything though... |
Send message Joined: 2 Oct 16 Posts: 167 Credit: 1,005,998,219 RAC: 45,670 |
Nothings wrong. See the 2nd post in the thread. |
Send message Joined: 17 Sep 13 Posts: 12 Credit: 2,706,117,276 RAC: 0 |
Thanks but I saw it and the situation seems different this time. First as I said the GPU sits idle most of the time, waiting for something apparently at these 1/5 increments. Second the tasks I had few days ago were already bundle5 tasks. And were, as you would expect, loading the gpu to near 100%. |
Send message Joined: 17 Sep 13 Posts: 12 Credit: 2,706,117,276 RAC: 0 |
Ok, things are back to normal. GPU at 100% load. WU being processed in the usual time. For future reference here what seems to have solved it for now: -disinstalled the GPU drivers -used the AMD clean up utility -reinstalled the drivers but this time an other version:15.301.2601.1002-whql-firepro-windows-retail.exe Let's hope that this time it won't revert by itself to this strange abnormal behavior... |
Send message Joined: 8 May 09 Posts: 3321 Credit: 520,484,862 RAC: 26,419 |
Ok, things are back to normal. GPU at 100% load. WU being processed in the usual time. If it's Win10 be aware of Wiundows updates, they often do things like install drivers they like that aren't always good for us. |
©2024 Astroinformatics Group