Message boards :
Number crunching :
GFLOPS backwards
Message board moderation
Previous · 1 · 2
| Author | Message |
|---|---|
|
Send message Joined: 19 Jul 10 Posts: 819 Credit: 21,098,268 RAC: 5,537 |
Am I doing something wrong?Yes, it has been posted dozens of times all over the forums, that Milkyway WUs will get stuck if running less than 100% of CPU time. I'm running 3 cores of my Ryzen 9 5900X at 20% of CPU time in order to keep its temperatures (and my electricity bill) within reasonable limits.If your cooling can handle just 3 out of 12 cores of your CPU (and only in case they are in use no more than 20% of the time), than it's completely broken and needs to be fixed or replaced. I also don't understand why people think running less than 100% of CPU time is a sensible way of saving energy while crunching for BOINC projects. Limiting or disabling the boost frequencies is a way to save energy, not burning energy 80% of the time for idling (100% when the WU gets stuck).
|
|
Send message Joined: 11 Sep 24 Posts: 24 Credit: 446,636 RAC: 5,755 |
Yes, this will be an update to a new application version with new code we need for our projects as well as various bug fixes. This will probably be released next week or the week after. We will make an announcement when we know when we will update. |
Keith MyersSend message Joined: 24 Jan 11 Posts: 739 Credit: 567,035,880 RAC: 33,929 |
Thanks for the update on the timeline. Looking forward to a new app.
|
|
Send message Joined: 5 Oct 25 Posts: 2 Credit: 1,957 RAC: 0 |
Okey doke. Having run a variety of projects for the last 20 years or so and having other things to do in my life than trawl innumerable fora for polite answers to a simple question, I'll take the easy route and remove MilkyWay from my list of projects. |
|
Send message Joined: 8 Aug 08 Posts: 27 Credit: 531,215 RAC: 18 |
Purest Green wrote: I'm running 3 cores of my Ryzen 9 5900X at 20% of CPU time in order to keep its temperatures (and my electricity bill) within reasonable limits. And that is exactly why you are having problems with run times - nothing is getting done.and it has to start from the beginning of the task each time it restart. Your Ryzen has 12 FULL CORES and an additional 12 lesser "helper" core for a total of 24 cores. Change the Use at most % of CPU time setting back to 100%. Then Change the Use at most % of the CPUs to 50% of the time. You can lower this number again if the heat and power use is still a problem. Reduce that percentage by 1/24 (4.167) to reduce an additional 1 core until the heat output is manageable for you. Your tasks will run faster, full time and not have to start and stop every 12 seconds of every minute. Imagine if you were trying to drive somewhere and your car only ran for 1056 feet (20%) of every mile you want to travel... It would take you at minimum 5 times as long to go that mile, not including the time it would take you to stop the car turn it off then restart the engine and reengage the drivetrain (trans) each time...
|
|
Send message Joined: 19 Jul 10 Posts: 819 Credit: 21,098,268 RAC: 5,537 |
Dr Who Fan wrote: Purest Green wrote:That's not correct, the task stays in memory and should continue from where it has been paused, even without a checkpoint, but the Milkyway Nbody application "doesn't like it", i.e. has a bug and gets stuck sooner or later.I'm running 3 cores of my Ryzen 9 5900X at 20% of CPU time in order to keep its temperatures (and my electricity bill) within reasonable limits.And that is exactly why you are having problems with run times - nothing is getting done.and it has to start from the beginning of the task each time it restart. Dr Who Fan wrote: It would take you at minimum 5 times as long to go that mile, not including the time it would take you to stop the car turn it off then restart the engine and reengage the drivetrain (trans) each time...No, the engine would stil run during the breaks and burn fuel just like the computer is running and burning energy while doing nothing if you set BOINC to less than 100% of CPU time. That feature is completly obsolete IMHO, it might have been the only way to keep poorly cooled single core CPUs with hardwired multiplicator (AthlonXP, Pentium 4 and older) within resonable temperatures, but using it for current CPUs is just complete nonsense. Limiting the clocks and, if that's not enough, limiting the threads in use is how you limit the heat on modern (or actually just not so ancient) CPUs. The last few hundred MHz add lots of heat (and power consumption) for very little additional work done, so for example I limited my Ryzen 5700G to 4GHz, that's likely where his Ryzen 5900X would also run a lot more efficient than at stock settings, i.e. boosting to 4.8 GHz.
|
|
Send message Joined: 1 Nov 10 Posts: 41 Credit: 2,879,221 RAC: 8,855 |
Most of what you say in reply to Dr Who Fan is wrong. There are processing overheads associated with the swapping from one process to another. Depending on how the particular task was written, and what is being done at the time of the swap a task may be able to resume cleanly, need to go back a small step, or in worst case go back to the start of the task. Then there's CPU's state - if the registers were in use at the time of the swap can their values be stored and re-written (indeed are they being stored so they can be re-written). Does the task resume of the same CPU core as it was running on ("real" vs. "virtual"), is the task written to be be able to swap CPU core type - if so it can restart, if not then it's back to the start. Then there's the thermal stress on the CPU, process swapping between work that present the same thermal load is fairly trivial, but between work with very different thermal loads is not trivial As for the car example - just think.... Here in Europe a lot of cars have "stop-start" technology which turn the fuel burning off when you stop, then start it when they are prepared to start again - fuel isn't used, but this increases the wear on the starting system and battery, even without this "feature" there is additional wear on the brakes and transmission. And of course the stop/start cycle increases journey time. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
|
Send message Joined: 19 Jul 10 Posts: 819 Credit: 21,098,268 RAC: 5,537 |
There are processing overheads associated with the swapping from one process to another.What swapping? There's no swapping, the BOINC client sends a command to the project application to pause computation, that's all. The application isn't "killed" or something, it doesn't even get the request to exit, just pause. It pauses by itself and continues by itself when it gets the command to do so. It doesn't need to go back to anything as it's still "running", just not processing any new data in that moment. Then there's CPU's state - if the registers were in use at the time of the swap can their values be stored and re-written (indeed are they being stored so they can be re-written). Does the task resume of the same CPU core as it was running on ("real" vs. "virtual"), is the task written to be be able to swap CPU core type - if so it can restart, if not then it's back to the start.If anything of this would matter, we wouldn't be able to complete a single task as the OS will move the application from core to core or even stop it completely as it thinks it's best for all other applications, which are running on higher priorities than BOINC applications. Furthermore it would have been even more impossible to complete any work in the past on single core CPUs, on which other applications will squeeze in their work between the BOINC work dozens of times per second. Here in Europe a lot of cars have "stop-start" technology which turn the fuel burning off when you stopYes, in that moment they don't burn any fuel, but they are stil using energy for everything else and they will use energy to start the engine. They will simply burn the fuel to compensate that later, the energy consumption isn't zero during the stop. And that's not that different from running BOINC less than 100% of time. Modern CPUs are able to turn unused cores off, but the rest is stil running and burning energy.
|
|
Send message Joined: 1 Nov 10 Posts: 41 Credit: 2,879,221 RAC: 8,855 |
What swapping? There's no swapping, the BOINC client sends a command to the project application to pause computation, that's all. The application isn't "killed" or something, it doesn't even get the request to exit, just pause. It pauses by itself and continues by itself when it gets the command to do so. It doesn't need to go back to anything as it's still "running", just not processing any new data in that moment. Some years ago I had good access to very low level task monitoring software for the Intel series of processors, particularly those running Windows - it was very educational to see a trace of how a "BOINC controlled" task operated when using the "use x% of CPU time" function on a multi core system. Even with a very lightly loaded CPU (running one, single-core task, and "only the operating system) there were times when during pauses (the 100-x% of CPU time) a random very short lived operating system function would be run on that core. How did the "BOINC controlled" behave? That varied dramatically, some would do a very clean restart, most would have to do some sort of restoration of state, while others would simply fail (more or less gracefully) On the subject of cars and stop/start technology. Taking my sample of one car, if I am in heavy traffic and not using the stop/start technology I observe a 10-20% increase in fuel burn compared to using the technology. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
|
Send message Joined: 19 Jul 10 Posts: 819 Credit: 21,098,268 RAC: 5,537 |
Even with a very lightly loaded CPU (running one, single-core task, and "only the operating system) there were times when during pauses (the 100-x% of CPU time) a random very short lived operating system function would be run on that core.This happens all the time on more loaded systems, no matter if the BOINC application is suspended or not, it runs on lowest possible priority, so the OS will stop it from running anytime it needs a thread for another application.
|
|
Send message Joined: 1 Nov 10 Posts: 41 Credit: 2,879,221 RAC: 8,855 |
There is a difference between the operating interruption of a running non-os process to do an os-related process and a scheduled interruption of the same non-os process. In the first case the os gives notice that the interuption may take place, and only acts on that request if permission to interrupt is obtained, in the latter the scheduler can just pause the non-os process, not giving it a chance to organise a graceful entry into the paused state (which is possibly one of the causes of the random failure of tasks to complete properly on systems that are using the "use x% of CPU time option). Further you are confusing the BOINC process and the task process - BOINC does run at very low priority, but many of the project tasks run at a higher level. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Keith MyersSend message Joined: 24 Jan 11 Posts: 739 Credit: 567,035,880 RAC: 33,929 |
I don't know what process level Windows runs things. But it is easy to see the various task and app process priority levels in Linux. The Boinc client runs at system level, nice 0 all the time on my Linux hosts. The various project cpu tasks run at process priority 19 which is the lowest priority. Gpu task run at process prioriity level 10 which is equivalent to "below_normal priority" in Windows lingo if I remember correctly. Those process levels are actually coded in the client somewhere.
|
|
Send message Joined: 19 Jul 10 Posts: 819 Credit: 21,098,268 RAC: 5,537 |
That's not different on Windows, the BOINC client runs with "normal" priority, i.e. like most other applications, project applications run on lowest priority, just above the priority of the idle process, however I think it's for both CPU and GPU, at least that's what BOINC documentation says. Of course you can change that in cc_config.xml, I run "special" applications (GPU, NCI, wrappers) with <process_priority_special>4</process_priority_special> at "high" priority.
|
Keith MyersSend message Joined: 24 Jan 11 Posts: 739 Credit: 567,035,880 RAC: 33,929 |
I've never seen any gpu task from any project run at anything other than nice 10. The only time I ever manually changed the process priority was back at Seti with the special apps and the schedtool utility. I used to run this script to bump gpu tasks to highest priority. #Run in root terminal, NOT sudo nvidia-smi -pm 1 for (( ; ; )) do # Assign CPU Priority (19=Nice/LowPriority, 0=Normal, -20=HighPriority) # This was code Petri gave out # GPU Tasks get high Priority schedtool -n -20 `pidof setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92` schedtool -n -20 `pidof astropulse_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100` # CPU Tasks get (a little) Below Normal Priority (0 being normal) to make sure it doesn't choke the OS schedtool -n 5 `pidof ap_7.05r2728_sse3_linux64` schedtool -n 5 `pidof MBv8_8.22r3711_sse41_x86_64-pc-linux-gnu` # Assign CPU Usage Threads (0-7) # Brent added this to Petri's code # Keep GPU tasks on threads 1 3 5 7 9 11 13 15 schedtool -a 1,3,5,7,9,11,13,15 `pidof setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92` schedtool -a 1,3,5,7,9,11,13,15 `pidof astropulse_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100` # Keep CPU tasks on threads 0 2 4 6 8 10 12 14 schedtool -a 0,2,4,6,8,10,12,14 `pidof MBv8_8.22r3711_sse41_x86_64-pc-linux-gnu` schedtool -a 0,2,4,6,8,10,12,14 `pidof ap_7.05r2728_sse3_linux64` # CPU Priority Assignment Script date # lscpu | grep MHz sleep 5 echo " CPU Priority and Assignment Script (8 Threads)" done I never could get the process priority tags in cc_config.xml to work. The schedtool utility always worked.
|
|
Send message Joined: 19 Jul 10 Posts: 819 Credit: 21,098,268 RAC: 5,537 |
Well, maybe it's different on Linux. For me, on Windows, it was running always on lowest priority IIRC. It didn't matter much with the old CAL-applications for my HD3850 here, but later with OpenCL or CUDA on my GTX275 there was a significant difference and since than I stick to high priority for "special" applications as it doesn't have any negative effects so far.
|
Keith MyersSend message Joined: 24 Jan 11 Posts: 739 Credit: 567,035,880 RAC: 33,929 |
Well the client gets compiled differently for Linux clients compared to Windows clients. If you care to investigate, you can peruse the /client directory on the Boinc Github repo site and look through the differences in what is compiled and integrated into the client at the platform level. And apps run at slightly different priorities depending on whether the host is Windows or Linux because the priority definitions in Boinc use Windows vernacular in the header definitions and need to be transposed for LInux niceness. https://github.com/search?q=repo%3ABOINC%2Fboinc+PROCESS_MEDIUM_PRIORITY&type=code This was commented on in the past https://boinc.berkeley.edu/forum_thread.php?id=8607&postid=50452 https://boinc.berkeley.edu/forum_thread.php?id=10296&postid=62286
|
Keith MyersSend message Joined: 24 Jan 11 Posts: 739 Credit: 567,035,880 RAC: 33,929 |
Well, maybe it's different on Linux. For me, on Windows, it was running always on lowest priority IIRC. It didn't matter much with the old CAL-applications for my HD3850 here, but later with OpenCL or CUDA on my GTX275 there was a significant difference and since than I stick to high priority for "special" applications as it doesn't have any negative effects so far. Looking at the code again, the special_priority tag for cc_config.xml to raise the gpu task priority is only defined for Win32, not for Linux. Guess that is why it never worked for me since I've always been a Boinc Linux user.
|
©2026 Astroinformatics Group