 
    
            Message boards : 
            Number crunching : 
        Work units not completing or are stalled
Message board moderation
    
| Author | Message | 
|---|---|
|  d_a_dempsey Send message Joined: 7 Jan 21 Posts: 14 Credit: 84,785,708 RAC: 0     | 
 After Windows rebooted last night from an update, my MW tasks are either a) reaching 100% and not complete, continuing to accrue time, or b) sit at 25%, 50% accruing time and increasing the estimate time to finish at the same rate. Before this my 2 GPUs were completing WUs in 2 minutes and 4 minutes. Reboot PC and power off/power on seems to work for WU and then goes back to problem above. Nvidia driver current: 461.09 01/07/2021 BOINC Client current: 7.6.11 (x64) CPU-based WU for WCG working fine. Any thoughts or suggestions? David | 
|  mikey  Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0       | 
 After Windows rebooted last night from an update, my MW tasks are either a) reaching 100% and not complete, continuing to accrue time, or b) sit at 25%, 50% accruing time and increasing the estimate time to finish at the same rate. I would remove and/or reload the gpu drivers as MS has a VERY bad habit of replacing the Nvidia ones with their own ones that seem similar but don't work well or at all for crunching | 
|  d_a_dempsey Send message Joined: 7 Jan 21 Posts: 14 Credit: 84,785,708 RAC: 0     | 
 
 I reinstalled the Nvidia driver, rebooted PC. The two stalled tasks completed almost immediately, and promptly stalled at 50% on the next 2 it started. Just sits there with Elapsed Time and Time Remaining merrily incrementing away. :( | 
|  mikey  Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0       | 
 
 I can't see your tasks or pc's because you have them hidden Are your using 100% of your cpu cores for crunching too? If so that could be a problem as your gpu needs cpu time to get and receive data as it crunches and if they are busy then it will slow down. Are you running another gpu project at the same time and maybe it's running instead of MilkyWay? | 
|  d_a_dempsey Send message Joined: 7 Jan 21 Posts: 14 Credit: 84,785,708 RAC: 0     | 
 Should be available soon, I adjusted preferences. Didn't know you could do that. I have 2 computers, an ancient HP with an NNVIDIA GTX 660. That one's not having problems (of course!). The one having problems is a Dell Alienware Area 51 R2, i7-582K @3.3GHz with an NVIDiA GE 980 and an NVIDIA GTX 1080 TI. Computing preferences are Use at most 80% of CPUs, Use at most 85% of CPU time. I have not adjust config files to run more than one task per GPU. I'm very new to this project but a 600M credit cruncher on GPUgrid. They're empty, so I came here and everything was great until yesterday. CPU-wise, I'm crunching for WCG, too, but my problems are specifically with the GPU WUs. Hope this helps, David | 
|  Keith Myers  Send message Joined: 24 Jan 11 Posts: 733 Credit: 564,748,565 RAC: 12,117         | 
 I don't see anything wrong with most of your tasks.  Just a few outliers that appear to be starved of enough cpu support and thus have double the runtimes and cputimes. Why not run at 100% cpu usage? Save a few threads for machine housekeeping and keep them free of BOINC loading. Your use at most 80% of your cpu threads is fine. I suspect your WCG cpu tasks are stealing too much support from your gpu tasks occasionally.   | 
|  d_a_dempsey Send message Joined: 7 Jan 21 Posts: 14 Credit: 84,785,708 RAC: 0     | 
 I don't see anything wrong with most of your tasks. Just a few outliers that appear to be starved of enough cpu support and thus have double the runtimes and cputimes. I'm not sure that the WCG cpu tasks are stealing too much from MW@H. 
 
 David   | 
|  d_a_dempsey Send message Joined: 7 Jan 21 Posts: 14 Credit: 84,785,708 RAC: 0     | 
 Setting it 84% helped a little. One of the MW@H tasks started working better, and a 9th WCG task. Clearly, BOINC is having trouble doing math with fractional CPU usages, e.g. 0.985. Would it be better to add an app_config.xml for MW@H and set <gpu_versions><cpu_usage>1</cpu_usage><gpu_versions> so that BIONC can do allot resources better, or do an app_config.xml for WCG and try to limit it to 8 cpus? Note: I only put relevant XML pieces in statement, I know there's more to it. :) David   | 
|  Keith Myers  Send message Joined: 24 Jan 11 Posts: 733 Credit: 564,748,565 RAC: 12,117         | 
 I hope that you realize that setting cpu usages for tasks has NOTHING to do with how much of a cpu a task uses.  It is only for BOINC scheduling to determine how much work to run concurrently. The science application ALONE is what determines how much of a cpu thread it needs to complete the compute of a task. Some applications are stingy in their need of cpu support and some actually need more than a single cpu thread. Einstein Gravity Wave tasks for example use as much as 125% of a cpu thread for each gpu task. It still seems that you are overcommitted on the cpu. Drop one WCG task and I bet you won't see the occasional stalling on the MW tasks. I always run one cpu thread per gpu tasks on Nvidia cards. Never a problem with that setting. Nvidia apps just need more cpu support compared to AMD apps. You can run an app_config and set the cpu and gpu usage for your applications. You can also put a max_concurrent limit on the WCG tasks. All that would help your problem.   | 
| Send message Joined: 14 Apr 17 Posts: 5 Credit: 361 RAC: 0     | 
 I'm having exactly the same problem, and it isn't from lack of cpu grunt to back up the gpu. Went from finishing 3 tasks at once in 5 to 6 minutes, with 9 threads of rosetta running also, to tasks not completing after 4+ hours with the whole cpu free for milkyway. 30% cpu usage now, only running 3 milkyway tasks on gpu, no rosetta. Gpu isn't even boosting, it's stuck at idle clocks. Kinda suspect it is the new nvidia drivers as I did update the other day, will roll back tomorrow and test but there is definitely something wrong. I crunch through grcpool so it might be hard to find my workunits | 
|  d_a_dempsey Send message Joined: 7 Jan 21 Posts: 14 Credit: 84,785,708 RAC: 0     | 
 Yes, I am aware I was just getting better information to BOINC for scheduling. Current app_config.xml files. 
<app_config>
    <app>
      <name>milkyway</name>
      <max_concurrent>2</max_concurrent>
      <gpu_versions>
          <gpu_usage>1</gpu_usage>
          <cpu_usage>1</cpu_usage>
      </gpu_versions>
    </app>
</app_config>
For WCG: <app_config> <project_max_concurrent>7</project_max_concurrent> </app_config> With use at most 84% of CPUs, and the above settings I have 9 tasks, 2 MW (1 CPU + 1 NVIDIA GPU) and 7 WCG packets against 10 of 12 total threads. Two threads reserved for OS and me. The problem still occurs as I have described. After stop/suspend, reboot, it will complete the packets it was working on, and stall on next set. Sometimes it makes it through 2 additional packets. CPU starvation is an interesting thought, but not sure why it would show up 6 days into project where it was running side-by-side with WCG. I suspect it's related to the patch and forced reboot, but don't have the skill to track it down as to whether corrupted config file or other weirdness. David   | 
|  mikey  Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0       | 
 With use at most 84% of CPUs, and the above settings I have 9 tasks, 2 MW (1 CPU + 1 NVIDIA GPU) and 7 WCG packets against 10 of 12 total threads. Two threads reserved for OS and me. The problem still occurs as I have described. After stop/suspend, reboot, it will complete the packets it was working on, and stall on next set. Sometimes it makes it through 2 additional packets. I always blame Microsoft when things change after they do a standard or forced update!! | 
|  Keith Myers  Send message Joined: 24 Jan 11 Posts: 733 Credit: 564,748,565 RAC: 12,117         | 
 While the problem is happening, change your logging preferences to set rr_simulation through at least one cycle of task completion/reporting/request for work and see what it says.  BOINC will tell you why it won't start new work in the output. Turn it off after you get the information as it is especially wordy and the log entries get out of hand.   | 
|  d_a_dempsey Send message Joined: 7 Jan 21 Posts: 14 Credit: 84,785,708 RAC: 0     | 
 While the problem is happening, change your logging preferences to set rr_simulation through at least one cycle of task completion/reporting/request for work and see what it says. BOINC will tell you why it won't start new work in the output. I will try that next. Here's what's been tried so far. 
 
 David   | 
|  d_a_dempsey Send message Joined: 7 Jan 21 Posts: 14 Credit: 84,785,708 RAC: 0     | 
 Here's some of the output from the rr_simulation option while both WUs appear to be stalled. Task ending 169_1 is on my GTX 980 and task ending 344_1 is running on my GTX 1080 TI 1/16/2021 1:02:08 PM | | Re-reading cc_config.xml 1/16/2021 1:02:08 PM | | Config: don't use GPUs while dndclient.exe is running 1/16/2021 1:02:08 PM | | Config: don't use GPUs while dndclient_awesomium.exe is running 1/16/2021 1:02:08 PM | | Config: don't use GPUs while dndclient64.exe is running 1/16/2021 1:02:08 PM | | Config: don't use GPUs while DNDLauncher.exe is running 1/16/2021 1:02:08 PM | | Config: don't use GPUs while turbineclientlauncher.exe is running 1/16/2021 1:02:08 PM | | Config: use all coprocessors 1/16/2021 1:02:08 PM | | log flags: file_xfer, sched_ops, task 1/16/2021 1:02:08 PM | Milkyway@Home | Found app_config.xml 1/16/2021 1:26:36 PM | | Re-reading cc_config.xml 1/16/2021 1:26:36 PM | | Config: don't use GPUs while dndclient.exe is running 1/16/2021 1:26:36 PM | | Config: don't use GPUs while dndclient_awesomium.exe is running 1/16/2021 1:26:36 PM | | Config: don't use GPUs while dndclient64.exe is running 1/16/2021 1:26:36 PM | | Config: don't use GPUs while DNDLauncher.exe is running 1/16/2021 1:26:36 PM | | Config: don't use GPUs while turbineclientlauncher.exe is running 1/16/2021 1:26:36 PM | | Config: use all coprocessors 1/16/2021 1:26:36 PM | | log flags: file_xfer, sched_ops, task, rr_simulation 1/16/2021 1:26:36 PM | Milkyway@Home | Found app_config.xml 1/16/2021 1:26:37 PM | | [rr_sim] doing sim: CPU sched 1/16/2021 1:26:37 PM | | [rr_sim] start: work_buf min 8640 additional 86400 total 95040 on_frac 0.961 active_frac 0.565 1/16/2021 1:26:37 PM | Milkyway@Home | [rr_sim] 0.02: de_modfit_84_bundle4_4s_south4s_bgset_4_1603804501_61530169_1 finishes (1.00 CPU + 1.00 NVIDIA GPU) (3.25G/135.60G) 1/16/2021 1:26:37 PM | Milkyway@Home | [rr_sim] 74.99: de_modfit_85_bundle4_4s_south4s_bgset_4_1603804501_61520344_1 finishes (1.00 CPU + 1.00 NVIDIA GPU) (10168.93G/135.60G) 1/16/2021 1:26:37 PM | Milkyway@Home | [rr_sim] 310.73: de_modfit_81_bundle4_4s_south4s_bgset_4_1603804501_61576619_0 finishes (1.00 CPU + 1.00 NVIDIA GPU) (42133.60G/135.60G) 1/16/2021 1:26:37 PM | Milkyway@Home | [rr_sim] 385.71: de_modfit_83_bundle4_4s_south4s_bgset_4_1603804501_61559680_1 finishes (1.00 CPU + 1.00 NVIDIA GPU) (42134.20G/135.60G) <snip> 1/16/2021 1:26:37 PM | | [rr_sim] doing sim: work fetch 1/16/2021 1:26:37 PM | | [rr_sim] already did at this time 1/16/2021 1:27:37 PM | | [rr_sim] doing sim: CPU sched 1/16/2021 1:27:37 PM | | [rr_sim] start: work_buf min 8640 additional 86400 total 95040 on_frac 0.961 active_frac 0.565 1/16/2021 1:27:37 PM | Milkyway@Home | [rr_sim] 0.02: de_modfit_84_bundle4_4s_south4s_bgset_4_1603804501_61530169_1 finishes (1.00 CPU + 1.00 NVIDIA GPU) (2.30G/135.61G) 1/16/2021 1:27:37 PM | Milkyway@Home | [rr_sim] 77.15: de_modfit_85_bundle4_4s_south4s_bgset_4_1603804501_61520344_1 finishes (1.00 CPU + 1.00 NVIDIA GPU) (10462.83G/135.61G) 1/16/2021 1:27:37 PM | Milkyway@Home | [rr_sim] 310.71: de_modfit_81_bundle4_4s_south4s_bgset_4_1603804501_61576619_0 finishes (1.00 CPU + 1.00 NVIDIA GPU) (42133.60G/135.61G) 1/16/2021 1:27:37 PM | Milkyway@Home | [rr_sim] 387.84: de_modfit_83_bundle4_4s_south4s_bgset_4_1603804501_61559680_1 finishes (1.00 CPU + 1.00 NVIDIA GPU) (42134.20G/135.61G) <snip> 1/16/2021 1:27:37 PM | | [rr_sim] doing sim: work fetch 1/16/2021 1:27:37 PM | | [rr_sim] already did at this time 1/16/2021 1:28:38 PM | | [rr_sim] doing sim: CPU sched 1/16/2021 1:28:38 PM | | [rr_sim] start: work_buf min 8640 additional 86400 total 95040 on_frac 0.961 active_frac 0.565 1/16/2021 1:28:38 PM | Milkyway@Home | [rr_sim] 0.01: de_modfit_84_bundle4_4s_south4s_bgset_4_1603804501_61530169_1 finishes (1.00 CPU + 1.00 NVIDIA GPU) (1.62G/135.62G) 1/16/2021 1:28:38 PM | Milkyway@Home | [rr_sim] 79.32: de_modfit_85_bundle4_4s_south4s_bgset_4_1603804501_61520344_1 finishes (1.00 CPU + 1.00 NVIDIA GPU) (10757.05G/135.62G) 1/16/2021 1:28:38 PM | Milkyway@Home | [rr_sim] 310.68: de_modfit_81_bundle4_4s_south4s_bgset_4_1603804501_61576619_0 finishes (1.00 CPU + 1.00 NVIDIA GPU) (42133.60G/135.62G) 1/16/2021 1:28:38 PM | Milkyway@Home | [rr_sim] 389.99: de_modfit_83_bundle4_4s_south4s_bgset_4_1603804501_61559680_1 finishes (1.00 CPU + 1.00 NVIDIA GPU) (42134.20G/135.62G) <snip> 1/16/2021 1:28:38 PM | | [rr_sim] doing sim: work fetch 1/16/2021 1:28:38 PM | | [rr_sim] already did at this time 1/16/2021 1:28:53 PM | | Re-reading cc_config.xml 1/16/2021 1:28:53 PM | | Config: don't use GPUs while dndclient.exe is running 1/16/2021 1:28:53 PM | | Config: don't use GPUs while dndclient_awesomium.exe is running 1/16/2021 1:28:53 PM | | Config: don't use GPUs while dndclient64.exe is running 1/16/2021 1:28:53 PM | | Config: don't use GPUs while DNDLauncher.exe is running 1/16/2021 1:28:53 PM | | Config: don't use GPUs while turbineclientlauncher.exe is running 1/16/2021 1:28:53 PM | | Config: use all coprocessors 1/16/2021 1:28:53 PM | | log flags: file_xfer, sched_ops, task 1/16/2021 1:28:53 PM | Milkyway@Home | Found app_config.xml David   | 
|  mikey  Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0       | 
 
 This does nothing except reserve a full cpu core, the app uses what it uses and WE USERS can NOT control it. | 
| Send message Joined: 14 Apr 17 Posts: 5 Credit: 361 RAC: 0     | 
 Installing nvidia driver 460.79 has fixed the problem for me, tried 461.09 again to double check and there is definitely something with the new driver that is breaking milkyway, or at least the current work units. To find older nvidia drivers, just fill in your gpu details here and grab 460.79 to install. Shouldn't need to do a clean install either, express worked fine for me. (edit) I know system restore should have reverted your driver to the previous version, but I still think it is worth trying to install the older driver from nvidia. I'm 100% sure it was the new driver that caused wu's to hang/crunch forever without completing. | 
|  d_a_dempsey Send message Joined: 7 Jan 21 Posts: 14 Credit: 84,785,708 RAC: 0     | 
 Installing nvidia driver 460.79 has fixed the problem for me, tried 461.09 again to double check and there is definitely something with the new driver that is breaking milkyway, or at least the current work units. That was it! Sometime between 1/7 and 1/12 I must have updated the Nvidia driver, possible the evening before it patched so that the issue coincided with the patch/reboot. Thank you!! I am happily crunching through a backlog of MW@H WUs on this computer. David   | 
|  Keith Myers  Send message Joined: 24 Jan 11 Posts: 733 Credit: 564,748,565 RAC: 12,117         | 
 Glad you figured it out.  Just wanted to comment, you may have future or further issues with that many exclusive apps defined that intrude or prevent crunching.   | 
|  d_a_dempsey Send message Joined: 7 Jan 21 Posts: 14 Credit: 84,785,708 RAC: 0     | 
 Glad you figured it out. Just wanted to comment, you may have future or further issues with that many exclusive apps defined that intrude or prevent crunching. It does look like a lot, but they're different subprograms of a single application, Dungeons & Dragons Online, and you you don't get 2 running at the same time, and the overlap with crunching time is typically just 10pm-11pm. Midnight if I'm having a good game and no work in the morning. :) Game doesn't need a lot of CPU, but it certainly uses the 1080 TI. All crunching and no play would be--boring. David   | 
 
        
        ©2025 Astroinformatics Group