Message boards :
Number crunching :
Stalled computation
Message board moderation
Author | Message |
---|---|
Send message Joined: 18 Nov 21 Posts: 3 Credit: 34,918,994 RAC: 0 |
Hello, Of late on my laptop active work units, after chugging along fine with decreasing time remaining, go apparently idle with the eight CPU usages down to a few percent for each core with increasing elapsed time and time to complete. I have to exit and restart BOINC to get usage back to normal. More recently I get the following message in the event log. "3/8/2022 12:52:02 PM | Milkyway@Home | Task de_nbody_08_31_2021_v176_40k__data__11_1645561443_646216_1 postponed for 600 seconds: Waiting to acquire slot directory lock. Another instance may be running." According to Task Manager I see only on instance. I'd appreciate any thoughts on what I'm doing wrong. Thank you, Ed Machak |
Send message Joined: 7 Jan 21 Posts: 14 Credit: 84,616,679 RAC: 0 |
You're not alone. I'm having this happen on 4 separate computers, but each time it is an N-Body simulation. Number of cores/age of chip doesn't seem to matter. Very frustrating to find that your crunching has been held hostage by these units for hours. I don't babysit my computers and shouldn't have to. David David |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
Hello, Since your pc's are hidden only an Admin can look at your tasks and see what's going on, to see what's shared if you do that click on my name and then View and then you can click on Computers and see mine. Nothing personal about ME is shared just the pc's. |
Send message Joined: 13 Feb 22 Posts: 1 Credit: 3,871,749 RAC: 0 |
I'm having the exact same problem and the task I find get stuck after about an hour. If this doesn't get fix I'm going to have to drop the project as it's not only affecting this BOINC project but delays in processing for other projects. I can't babysit my computer just to donate computation power. |
Send message Joined: 7 Jan 21 Posts: 14 Credit: 84,616,679 RAC: 0 |
Hello, Ed, What event log options do you have selected? I'm not seeing any messages when mine stall. David |
Send message Joined: 12 Nov 21 Posts: 236 Credit: 575,038,236 RAC: 0 |
Are you guys seeing stalls on both n body AND separation? |
Send message Joined: 7 Jan 21 Posts: 14 Credit: 84,616,679 RAC: 0 |
Are you guys seeing stalls on both n body AND separation? I'm only getting them on N-Body, and a lot of them. No issues with Separation. David David |
Send message Joined: 18 Feb 22 Posts: 5 Credit: 1,138,901 RAC: 0 |
I'm having problems with some tasks that go on indefinitely. The time remaining goes up rather than down so task never finishes. I have to abort the task so others can run. Happens for several Windows 10 computers. Don't know if it is just n-body or not. Will have to shift to other projects if this problem persists. |
Send message Joined: 18 Feb 22 Posts: 5 Credit: 1,138,901 RAC: 0 |
Here is an example of an n-body task that stalled:: task de_nbody_08_31_2021_v176_40k__data__12_1647295263_10374507_0 After it stalled the CPU usage was just a few percent suggesting computation was truly stalled. I tried a suspend and resume but remained stalled at 20.843% done. I also tried upping the CPU time usage from 75% to 100% but still remained stalled. After looking through earlier comments in this thread I tried the following: I exited from the BOINC manager and restarted it. This worked; the task progressed normally and ran to completion. Hope this example may help if someone wants to dig into what's happening more than I can. |
Send message Joined: 14 Feb 20 Posts: 1 Credit: 7,430 RAC: 0 |
I would like to know why a file that is supposed to download in, say, 28 minutes is taking several hours--or more--to complete. Thanks, XaurreauX |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
I would like to know why a file that is supposed to download in, say, 28 minutes is taking several hours--or more--to complete. Server problems on MilkyWays end |
Send message Joined: 18 Feb 22 Posts: 5 Credit: 1,138,901 RAC: 0 |
I have had several more stalled n-body tasks today. The latest stalled at 14.004% done after running for over an hour, blocking all other BOINC tasks from running, so I will have to abort it and maybe switch to another project. Here are the stalled task details: Application Milkyway@home N-Body Simulation 1.82 (mt) Name de_nbody_08_31_2021_v176_40k__data__11_1647295263_13261164 State Running Received 4/18/2022 9:20:16 AM Report deadline 4/30/2022 9:20:15 AM Resources 8 CPUs Estimated computation size 12,303 GFLOPs CPU time 00:08:57 CPU time since checkpoint 00:02:01 Elapsed time 01:27:29 Estimated time remaining 08:57:12 Fraction done 14.005% Virtual memory size 11.23 MB Working set size 13.29 MB Directory slots/4 Process ID 6096 Progress rate 9.720% per hour Executable milkyway_nbody_1.82_windows_x86_64__mt.exe |
Send message Joined: 18 Feb 22 Posts: 5 Credit: 1,138,901 RAC: 0 |
This n-body task stalled at just 0.038% done with 232 days remaining. I give up but will check back later. Application Milkyway@home N-Body Simulation 1.82 (mt) Name de_nbody_08_31_2021_v176_40k__data__13_1647295263_13271884 State Project suspended by user Received 4/18/2022 9:20:16 AM Report deadline 4/30/2022 9:20:15 AM Resources 8 CPUs Estimated computation size 12,267 GFLOPs CPU time 00:05:00 CPU time since checkpoint 00:03:33 Elapsed time 02:06:50 Estimated time remaining 233d 03:26:21 Fraction done 0.038% Virtual memory size 11.22 MB Working set size 5.24 MB Directory slots/4 Process ID 13236 Executable milkyway_nbody_1.82_windows_x86_64__mt.exe |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
This n-body task stalled at just 0.038% done with 232 days remaining. I give up but will check back later. What else does this pc do besides crunch MilkyWay n-body tasks? And is it doing that when this happens? I just ran thru a few n-body tasks on my laptop and had zero slowdown but the laptop was not doing anything else at the time, and it wasn't using 100% of the cpu's either. I ran some run thru using 12cores and then a few more with 2cores, running multiple tasks at once, but I also have Boinc set to only use a max of 75% of the cpu cores and then only 90% of the cpu time. |
Send message Joined: 18 Feb 22 Posts: 5 Credit: 1,138,901 RAC: 0 |
Turns out I almost fixed the stalling problem. A preventative workaround fix is to set the CPU time usage to 100% in the computing preferences, if set lower than that. After doing this none of the n-body tasks that I have run have stalled on the PC's that previously did stall, which all have Intel i7 CPU's. As noted in my earlier post If an n-body task has already stalled the only fix I know is to exit BOINC and restart. Note setting CPU time usage to the maximum 100% will likely increase the CPU temperature. I was able to bring it down by eliminating overclocking and setting performance options to maximize stability and save energy. Some older PC's with duo and quad CPU's (both Intel and AMD) have never stalled on n-body tasks, so I have left the CPU time usage at the default 75%. May the 4th be with you. |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
Turns out I almost fixed the stalling problem. I'm glad you found the solution!! |
Send message Joined: 12 Mar 22 Posts: 2 Credit: 979,779 RAC: 0 |
N-body task runs forever. If I stop and restart it then it will finish just few minutes. I'll try changing the computing preferences.It that does not help I have to quit running Milkyway |
Send message Joined: 8 Nov 11 Posts: 205 Credit: 2,900,464 RAC: 0 |
Have you allocated your entire number of CPUs to it ? Try a reduced number like 4 if you have 8, ie 50% of your resources. |
Send message Joined: 12 Mar 22 Posts: 2 Credit: 979,779 RAC: 0 |
Yes I have. I dropped CPU usage to 80% and I still have same problem |
Send message Joined: 8 Nov 11 Posts: 205 Credit: 2,900,464 RAC: 0 |
How many Cores/processors have you got ? |
©2024 Astroinformatics Group