Questions and Answers :
Windows :
Some tasks stalling
Message board moderation
Author | Message |
---|---|
Send message Joined: 4 Dec 14 Posts: 8 Credit: 88,342,179 RAC: 1,515 |
Has anyone been reporting issues with tasks stalling? I've had to cancel several work units that basically stalled and were showing days left to complete after processing for days. They were holding up other work units. Normally work tasks complete within the day they are started. I feel bad about cancelling them, but they were causing other units to go past their due dates. This is something I have never had issues with before. Unfortunately, I did not write down the name of the tasks, but they disappeared after aborting. |
Send message Joined: 12 Nov 21 Posts: 236 Credit: 575,038,236 RAC: 0 |
were they separation or N body? |
Send message Joined: 19 Jul 10 Posts: 623 Credit: 19,259,121 RAC: 375 |
Were they actually stalled, i.e. not using any CPU time? If yes, restarting the BOINC client should have fixed that. |
Send message Joined: 4 Dec 14 Posts: 8 Credit: 88,342,179 RAC: 1,515 |
I have another one at the moment and it is an Nbody. It says it is 6.255% through the task and has been running for 10 hours and 42 minutes with 6 days 16 hours remaining. It appears to be progressing, but tomorrow it will show even more time remaining. The due date is today. All other tasks appear to be running normally, as I have a few other Projects queued up as well. G |
Send message Joined: 4 Dec 14 Posts: 8 Credit: 88,342,179 RAC: 1,515 |
Stalled is just a description - it was still processing, but each time I looked at it, the time remaining had increased substantially. Other tasks were running normally. G |
Send message Joined: 4 Dec 14 Posts: 8 Credit: 88,342,179 RAC: 1,515 |
All - I just did a restart of BOINC on the one I described (Nbody) and it did appear to change the numbers it was reporting from hours to just minutes again. I will check back on it later and report whether it finished or not. I do restart the machine about once a week out of habit, so it does occasionally get a reboot. Just confused on why this is occurring, as I had not had an issue before. G |
Send message Joined: 4 Dec 14 Posts: 8 Credit: 88,342,179 RAC: 1,515 |
Thought the restart solved the issue, but no. Has run for 1.5 hours and now reporting 2.5 hours left. Like I said, it will just keep churning but never finish. So is there something about the Nbody tasks that is causing the issue on my hardware? (something not adequate to complete the task?). I am running a Dell machine, Windows 11 Pro (22H2), NVIDIA Model GeForce GTX 1660 Ti for graphics; i7 for processor and 16 GB RAM. Task id is de_nbody_11_14_2022_v182_40k__data__4_1666898646_819123. I have an older Win10 machine running also, and I see the same phenomena with an Nbody task showing running for days, but I don't think that is accurate. Many more of this type of task are in the list waiting to start, so I may have the same issue going on with all of them? I have an NVidia card on that machine also, is that possibly the common thread? Just trying to understand the cause. Like I said, hate to abort the tasks. G |
Send message Joined: 19 Jul 10 Posts: 623 Credit: 19,259,121 RAC: 375 |
Stalled is just a description - it was still processing, but each time I looked at it, the time remaining had increased substantially. Stalled means task is showing in BOINC as processing, but not using any CPU time, i.e. it's not processing. You can check that in the Windows Task Manager. BOINC is simply not checking wether the application really is doing something or not, it starts the task and waits for the application to tell it when it's done. If it doesn't get updates from the application about the progress, it starts counting it by itself, that's why you see the remaining time increasing. Task 642620787 was definitely stalled, lots of run time but no CPU time. If nBody causes lots of issues on your computers, simply disable it in your project preferences, than you will not need to abort them. |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
Thought the restart solved the issue, but no. Has run for 1.5 hours and now reporting 2.5 hours left. Like I said, it will just keep churning but never finish. So is there something about the Nbody tasks that is causing the issue on my hardware? (something not adequate to complete the task?). I am running a Dell machine, Windows 11 Pro (22H2), NVIDIA Model GeForce GTX 1660 Ti for graphics; i7 for processor and 16 GB RAM. Task id is de_nbody_11_14_2022_v182_40k__data__4_1666898646_819123. I have an older Win10 machine running also, and I see the same phenomena with an Nbody task showing running for days, but I don't think that is accurate. Many more of this type of task are in the list waiting to start, so I may have the same issue going on with all of them? I have an NVidia card on that machine also, is that possibly the common thread? Are you limiting the n-body tasks to X number of cpu cores or are you letting MilkyWay decide how many to use? |
Send message Joined: 4 Dec 14 Posts: 8 Credit: 88,342,179 RAC: 1,515 |
I will take a look at that - thanks for the suggestion. G |
Send message Joined: 4 Dec 14 Posts: 8 Credit: 88,342,179 RAC: 1,515 |
I was not aware that specific tasks could be limited. I had reduced my CPU access to 50% in BOINC, have now adjusted that up to 80% and will see how things run. G |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
I was not aware that specific tasks could be limited. I had reduced my CPU access to 50% in BOINC, have now adjusted that up to 80% and will see how things run. That's not what i meant, what I meant was do you have an app_config.xml file like this in your MilkyWay folder" <app_config> <app_version> <app_name>milkyway_nbody</app_name> <max_concurrent>1</max_concurrent> <plan_class>mt</plan_class> <avg_ncpus>2</avg_ncpus> <cmdline>--nthreads 2</cmdline> </app_version> </app_config> This particular file only uses 2 cpu cores per N-Body task AND only runs 1 task at a time. Your problem could be that you are using all your cpu cores on a single tasks and your pc is locking up, which you could have just made worse with your change in settings, Link has better info on this than I do and IF it applies to you or not. Link also probably has a better file than I do as I haven't run N-Body tasks for awhile as I'm doing Separation tasks right now. |
Send message Joined: 19 Jul 10 Posts: 623 Credit: 19,259,121 RAC: 375 |
Link also probably has a better file than I do as I haven't run N-Body tasks for awhile as I'm doing Separation tasks right now.In fact I have not run them since a while either. Your app_config looks right, for the 8 core machine I'd go with: <app_config> <app_version> <app_name>milkyway_nbody</app_name> <max_concurrent>1</max_concurrent> <plan_class>mt</plan_class> <avg_ncpus>4</avg_ncpus> <cmdline>--nthreads 4</cmdline> </app_version> </app_config> |
Send message Joined: 26 Feb 23 Posts: 3 Credit: 12,724 RAC: 0 |
I have the same problems since march, described in my thread "task is running infinite like in a loop" and now I see that somebody else has reported exactly the same on 24. January here :-( But still I dont have a solution I have an i5-760 with 4 Cores and a GTX 1070 and want to use all % which is not needed by my own activities, so what would you recommend to configure? |
Send message Joined: 18 Aug 20 Posts: 4 Credit: 48,412,272 RAC: 48,001 |
I have a question and possibly a problem with some tasks stalling I am running an AMD Ryzen 7 5700U (16 cores supposedly) with Radeon Graphics CPU. Question Almost always, my system processes with a status of "Running (0.903 CPUs + 1 AMD/ATI GPU (device 0 or device 1))" Does the Running 0.903 CPUs mean it is running 90.3% of all CPUs or 90.3% of one CPU? Problem At the same time, I have a task which reports 14.531% complete and the status is "Running (12 CPUs)". However, this task is stalled and doesn't process (it is an N-Body Sim task). I have had several task like this occur over the past couple of weeks. The latest one had a deadline date of 4/19, but as of today (4/20) the progress remains frozen at 14.531% and the status is running. When I looked for the N-Body setting to check or uncheck (don't remember which) the setting doesn't show up. I don't know why the "Running (12 CPUs)" task stalls. Any answers or suggestions would be greatly appreciated. Thank you.[/img] |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
I have a question and possibly a problem with some tasks stalling NO it means you are crunching gpu tasks with your Radeon Graphics CPU using 0.903% of a cpu core which you probably should not be on a laptop . GPU tasks create alot of heat and with a laptop being designed to be light and powerful the airflow is compromised more than in a desktop pc. Problem Use Links file below, copy and paste it into Notepad, NOT a word processing program, and save it as "app_config.xml" without the quotes of course and place it in the Milkyway folder c:\program data\boinc\projects\milkyway.cs.rpi.edu_milkyway After that go back into the Boinc Manager and click on Options and then click on read config files and Boinc will then be running the N-Body tasks with less than 16 cores per task. Link's file says to run max 1 task at a time, the top line, then it says to only use 4 cpu cores per task. You can adjust that to fit what you want it to do, ie 2 tasks at a time each using 4 cpu cores or even 3 tasks at a time each using 4 cpu cores leaving 4 cpu cores available to you and whatever else you are doing on the pc. |
Send message Joined: 16 Feb 13 Posts: 1 Credit: 53,032,313 RAC: 137 |
My recent average has taken a dive recently and I thought I would take a look at why. But even using the above XML settings my computer still tries to do 4 tasks at a time. I do work on two computing projects but this one is set as the priority. I have the prefs set to 75% CPUs and 15% CPU time. I have to set this as even though I have a water cooler and a room at 20C if left to 100% I end up in the 90C zone and then the CPU starts to throttle the speed. [/img] |
Send message Joined: 19 Jul 10 Posts: 623 Credit: 19,259,121 RAC: 375 |
I have the prefs set to 75% CPUs and 15% CPU time. I have to set this as even though I have a water cooler and a room at 20C if left to 100% I end up in the 90C zone and then the CPU starts to throttle the speed.If that is what you need to keep a water cooled computer from throttling, there is either some serious issue with cooling system (like the cooler not properly attached, failing pump or similar), or you overclock to much. Not even an air cooled system should need that low settings if build and working properly. Your RAC took a dive likeliy because there are no more Separation tasks which payed a lot more than n-Body for the same amount of CPU time. |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
My recent average has taken a dive recently and I thought I would take a look at why. But even using the above XML settings my computer still tries to do 4 tasks at a time. This will continue until you have finished all the tasks you have in the cache that were gotten under the old 4 at a time settings, then you will change to the new settings. |
Send message Joined: 18 Aug 20 Posts: 4 Credit: 48,412,272 RAC: 48,001 |
I am also experiencing the stalling of N-Body simulation tasks. When I reboot my Win11 HP laptop with 16M of memory and a 1TB SSD it will run for maybe 5 min - 30 min and then it appears to stall and the number in the progress column then never changes. I can wait hours and it never changes. I resolve the "Progress" issue temporarily by 1) Restarting my computer 2) Closing and opening the BIONIC Manager Application OR 3) I will go to Activity on the menu bar and select "Suspend" for the CPU section. All tasks then report a status of "Suspended - user request (8 CPUs)". When I select Activity on the menu bar and select "Run always" or "Run based on preferences", the numbers in the progress column starts counting up again. If I wait again, anywhere from the 5 min - 30 min the N-Body Simulation task stalls and ceases to progress. 4) When I go into Windows task manager and look at the BIONIC manager process percentage when the processing stops, the CPU % use is zero (0) or maybe .1%. Once I "Suspend" the task and select "Run based on preferences" under Activity on the menu bar, the task starts counting up and the CPU usage for Bionic manager returns to 40% usage (Which is what I have my preferences set for). It almost seems like for whatever reason 1) A setting needs to be changed somewhere, 2) There is some sort of a memory leak in the application, or 3) The system runs out of resources and can't continue until a reboot, the BONIC Manager restarts, or the task is suspended and re-enabled. Does any of this make sense? |
©2024 Astroinformatics Group