Welcome to MilkyWay@home

Milkyway@home N-Body Simulation v1.82 (mt) windows_x86_64 stuck

Message boards : Number crunching : Milkyway@home N-Body Simulation v1.82 (mt) windows_x86_64 stuck
Message board moderation

To post messages, you must log in.

AuthorMessage
rcl1

Send message
Joined: 25 May 20
Posts: 2
Credit: 1,584,848
RAC: 109
Message 76336 - Posted: 12 Aug 2023, 8:05:49 UTC

These tasks seem to get "stuck". Time remaining counting up, CPU % going to nothing
ID: 76336 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3322
Credit: 520,689,248
RAC: 33,726
Message 76337 - Posted: 12 Aug 2023, 10:40:27 UTC - in response to Message 76336.  

These tasks seem to get "stuck". Time remaining counting up, CPU % going to nothing


Well your pc's are hidden so it's very hard to help you without more info so here are some questions for you:
1: how many cpu cores are in your pc and
2: did you restrict MilkyWay from using all of them or did you use the defaults?
3: are you using the cpu graphics capabilities to crunch as well or do you have a stand alone gpu?
4: in your setting is Boinc using 100% of the cpu time or did you restrict it to less that that?
ID: 76337 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
rcl1

Send message
Joined: 25 May 20
Posts: 2
Credit: 1,584,848
RAC: 109
Message 76338 - Posted: 13 Aug 2023, 1:24:59 UTC - in response to Message 76337.  

129.84 1,376,814 7.20.2 GenuineIntel
Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz [Family 6 Model 158 Stepping 10]
(12 processors) INTEL Intel(R) UHD Graphics 630 (4862MB) OpenCL: 3.0 Microsoft Windows 11
Core x64 Edition, (10.00.22621.0 0)

To keep the CPU temperature down (I'm in Phoenix) usually limit to 2 cpus and 50% time.
ID: 76338 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3322
Credit: 520,689,248
RAC: 33,726
Message 76339 - Posted: 13 Aug 2023, 2:53:18 UTC - in response to Message 76338.  

129.84 1,376,814 7.20.2 GenuineIntel
Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz [Family 6 Model 158 Stepping 10]
(12 processors) INTEL Intel(R) UHD Graphics 630 (4862MB) OpenCL: 3.0 Microsoft Windows 11
Core x64 Edition, (10.00.22621.0 0)

To keep the CPU temperature down (I'm in Phoenix) usually limit to 2 cpus and 50% time.


Okay that's a start...try upping that to 100% of the cpu time and if you need to drop it to a single cpu core to keep the temp under control, future tasks will drop to a single core not any existing tasks already on your pc.

If you click on my name you can then click on my computers and see what's shared, nothing personal just stats and lots of info about the tasks. You won't find any tasks for me because I'm not currently crunching MilkyWay tasks but you will get the idea if you decide to share you pc('s)
ID: 76339 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
keputnam

Send message
Joined: 22 Oct 10
Posts: 16
Credit: 143,923,405
RAC: 3,968
Message 76340 - Posted: 14 Aug 2023, 17:18:40 UTC - in response to Message 76339.  
Last modified: 14 Aug 2023, 18:05:38 UTC

I have observed similar behavior

Win10 Pro, Core i-7 11700K, 8 cores hyper threaded. Milkyway limited to 3 threads to allow other projects to share the system

Most, but not all, of the new CPU Nbody jobs start fine, but after a while, the time remaining starts climbing, and Resource Monitor shows the job using .01% CPU

I've left several to run for quite a while, and none of them ever end. When looking at my returned tasks on the this site, you can see that processor time of the aborted tasks is nowhere near 3x wall clock time
ID: 76340 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 597
Credit: 18,982,651
RAC: 5,725
Message 76341 - Posted: 15 Aug 2023, 6:42:35 UTC - in response to Message 76340.  

And are you as well not letting BOINC use 100% of CPU time? If yes, there's the issue, N-Body doesn't like it.
ID: 76341 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
keputnam

Send message
Joined: 22 Oct 10
Posts: 16
Credit: 143,923,405
RAC: 3,968
Message 76368 - Posted: 2 Sep 2023, 7:05:24 UTC - in response to Message 76341.  
Last modified: 2 Sep 2023, 7:05:51 UTC

Yes, I have CPU set to 100%, but BOINC limited to 3 "processors"

Two WUs have completed, most still "runaway"
ID: 76368 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
keputnam

Send message
Joined: 22 Oct 10
Posts: 16
Credit: 143,923,405
RAC: 3,968
Message 76369 - Posted: 3 Sep 2023, 21:27:33 UTC - in response to Message 76368.  

And, of course, now that I've posted here, 6 of the last 7 WUs have completed successfully
ID: 76369 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
keputnam

Send message
Joined: 22 Oct 10
Posts: 16
Credit: 143,923,405
RAC: 3,968
Message 76374 - Posted: 12 Sep 2023, 0:12:20 UTC - in response to Message 76369.  

So I am averaging about one bad WU in 4-5 But if I don't catch it and abort it, I will tie up 3 "CPUs" forever

Sorry, but this project isn't worth the hassle

I'll check back periodically to see if you've fixed it
ID: 76374 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
meowr

Send message
Joined: 15 Feb 21
Posts: 3
Credit: 17,015,259
RAC: 2,654
Message 76375 - Posted: 12 Sep 2023, 2:57:59 UTC - in response to Message 76339.  

My problem is that the N-Body task is taking up almost 100% of my computer along with an Einstein task. These two are blocking all my other programs. I have well over a dozen of each waiting in the queue, it's been over a week since any others have run. I've now suspended each one and now have 16 tasks over 6 programs.
ID: 76375 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3322
Credit: 520,689,248
RAC: 33,726
Message 76376 - Posted: 12 Sep 2023, 10:24:19 UTC - in response to Message 76375.  

My problem is that the N-Body task is taking up almost 100% of my computer along with an Einstein task. These two are blocking all my other programs. I have well over a dozen of each waiting in the queue, it's been over a week since any others have run. I've now suspended each one and now have 16 tasks over 6 programs.


It sounds like you to need to use an app_config.xml file in each c:\program data\boinc\projects folder to limit the total number of tasks each project can run at one itme, I use one like this:

<app_config>
<project_max_concurrent>1</project_max_concurrent>
</app_config>

That says that only 1 task is allowed to run at a time from that Project, you can of course change the number to reflect your own preferences and if you use a zero then it will use all available cpu and gpu's to run on the pc for that project. You use the file by copying and pasting the above into Notepad in Windows and then saving it in the folder for the Project you want I have one in every Project folder and adjust them as my crunching needs change. Be sure to save the file as "app_config.xml" no quotes and when you are done go into the Boinc Manager and click on Options, read config files so it takes effect right away.

Just be aware that it means MAX_CONCURRENT for the Project, not just a type of task so if you are running Einstein both cpu and gpu tasks you would need to have at least 2 in the line but it does not specify one of each kind of task, that requires a more involved set of parameters. For me it doesn't matter as I almost never run cpu and gpu tasks from the same project at the same time as the cache settings don't split the cpu and gpu tasks and sometimes I will get a cache full of cpu tasks and my gpu will sit idle.
ID: 76376 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
keputnam

Send message
Joined: 22 Oct 10
Posts: 16
Credit: 143,923,405
RAC: 3,968
Message 76377 - Posted: 12 Sep 2023, 16:35:01 UTC - in response to Message 76376.  

That will indeed limit Milkyway to 1 task

But it will STILL use all "CPUs"

This is my app_config, which also limits Milky way to three "CPUs"

<app_config>
<app>
<name>milkyway_nbody</name>
<max_concurrent>1</max_concurrent>
<report_results_immediately/>
<fraction_done_exact/>
<gpu_versions>
<gpu_usage>1</gpu_usage>
<cpu_usage>.5</cpu_usage>
</gpu_versions>
</app>

<app_version>
<app_name>milkyway_nbody</app_name>
<plan_class>mt</plan_class>
<avg_ncpus>3</avg_ncpus>
<cmdline>--nthreads 3 </cmdline>
</app_version>
<report_results_immediately/>
</app
ID: 76377 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jesse Viviano

Send message
Joined: 4 Feb 11
Posts: 86
Credit: 60,913,150
RAC: 0
Message 76381 - Posted: 13 Sep 2023, 1:45:06 UTC

I had one that was stuck. I eventually shut BOINC down to perform a OS update and GPU driver update. After rebooting, the stuck task suddenly rushed towards completion and quickly finished. Maybe all that you needed to do is to quit BOINC with the option to shut down all tasks with it, and then restart BOINC.
ID: 76381 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
meowr

Send message
Joined: 15 Feb 21
Posts: 3
Credit: 17,015,259
RAC: 2,654
Message 76382 - Posted: 13 Sep 2023, 3:58:02 UTC - in response to Message 76377.  

The one MW task uses all my computing power. I'm not a computer geek that can do all the above. I'm just suspending MW until all my uncompleted tasks expire. I'll try later and if these N-Body tasks show up, I'll delete MW.
ID: 76382 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 597
Credit: 18,982,651
RAC: 5,725
Message 76383 - Posted: 13 Sep 2023, 10:25:02 UTC - in response to Message 76381.  

I had one that was stuck. I eventually shut BOINC down to perform a OS update and GPU driver update. After rebooting, the stuck task suddenly rushed towards completion and quickly finished. Maybe all that you needed to do is to quit BOINC with the option to shut down all tasks with it, and then restart BOINC.
Yes, restarting BOINC is the easiest way to get them running again.
ID: 76383 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 597
Credit: 18,982,651
RAC: 5,725
Message 76384 - Posted: 13 Sep 2023, 10:28:35 UTC - in response to Message 76382.  
Last modified: 13 Sep 2023, 10:34:46 UTC

The one MW task uses all my computing power.

Unless you've configured it differently in cc_confing.xml, they are using only that computing power, which you don't need for anything else. As long as you don't want to squeeze out the last bit of computing power, there's no need to configure anything, just let BOINC do it's job, in worst case you'll need to restart it sometimes, but that should not happen often if you simply let BOINC use 100% of CPU time.
ID: 76384 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
keputnam

Send message
Joined: 22 Oct 10
Posts: 16
Credit: 143,923,405
RAC: 3,968
Message 76385 - Posted: 13 Sep 2023, 17:10:58 UTC - in response to Message 76383.  

No, it is not

I've rebooted and the stuck WUs remain stuck
ID: 76385 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mway01902

Send message
Joined: 24 Sep 23
Posts: 6
Credit: 142,061
RAC: 88
Message 76401 - Posted: 28 Sep 2023, 1:12:19 UTC - in response to Message 76336.  

I'm new to this project and run several old laptops rather than one good processor.
I've seen the problem on two windows computers and zero Linux.
Someone mentioned to allow Boinc 100% CPU cycles. This wasn't a complete fix.
I also tried increasing the checkpoint time significantly. Not really conclusive either.
What seems to help the most is, bumping the Milkyway... task priority to Above Normal in Task Manager Details tab.
This seems better than giving Boinc 100% CPU cycles because you can't get cycles if you're a low priority and nnntube's playing.
This has worked best, but stalls aren't at zero if you use the computer for other purposes. It is near zero though.
This also only works if the task is using CPU. It won't recover a stalled task. You still have to restart Boinc Manager with
Stop Running Tasks... checked when you exit. I haven't seen the problem lately so I don't know if you can just Suspend and Resume the task.
It didn't affect normal computer operations, but they're slow computers and I don't expect much.
I manually changed the task priority. I don't know how to implement this automatically, but the tasks take so long even when working, I won't be overworked making manual tweaks.
ID: 76401 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 597
Credit: 18,982,651
RAC: 5,725
Message 76402 - Posted: 28 Sep 2023, 13:06:00 UTC - in response to Message 76401.  

If higher priority helps, than this entry in cc_config.xml can be used for it:
<cc_config>
 <options>
  <process_priority>N</process_priority>
 </options>
</cc_config>

Possible values are 0 (lowest priority, the default), 1 (below normal), 2 (normal), 3 (high) and 4 (highest).
ID: 76402 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Milkyway@home N-Body Simulation v1.82 (mt) windows_x86_64 stuck

©2024 Astroinformatics Group