Welcome to MilkyWay@home

Posts by mikey

21) Message boards : Number crunching : Milkyway CPU usage reduced to zero, other processes after high cpu/ram usage (Message 76833)
Posted 29 Jan 2024 by Profile mikey
Post:
The Milkyway project's CPU usage sometimes drops to zero and stays that way. Especially if there is little (free) memory in the machine. Or if a background application with a high CPU/RAM requirement starts. After that, Milkyway won't restart hours later, and BOINC client task switching doesn't work either. (I spent many, many hours looking at Resource Monitor and the Boinc client)
I reproduced the error six times out of seven attempts on three computers using the following steps:
- the system/boinc client starts (Milkyway is running).
- I filled the RAM with data (free memory is about zero)
- I started Win-Defender from a .bat file with command line delay
- Windows-Defender completely loads the cpu/ram
- Milkyway detects high CPU usage and shuts down
- Windows-Defender ends (a lot of memory is freed up!)
- Milkyway did not start again, or stopped after 1...2 minutes, but the status changed to "Running".
- Milkyway project (not a task) manual suspension > another project (Einstein for me) starts immediately and works normally.
- The operation of the Milkyway project is restored only after the Boinc client is restarted (until the next shutdown)
***
Notes:
- then the Milkyway project does not freeze, it simply does not work
- this stop also stops the Boinc client in the sense that task switching does not work. Because of this, other projects do not start either.
- the result of the "load test" was the same for other cpu-loading programs (browser, etc.), so the problem is not caused by the operation of the antivirus
- With little free memory, Milkyway sometimes crashes even without heavy CPU load
- The other project that works for me is Einstein. This does not cause an error. It did not stop even with multiple and persistent cpu/ram overloads. It can be seen from the cpu usage that Einstein is also struggling, but he is pulling himself together. Its resource management is programmed to be very robust.
- when I realized this (three days ago) I stopped the Milkyway project. Only Einstein starts and has collected more credits in three days than previously in a week and a half.
***
Milkyway state is "Running", but no cpu usage:


.
Einstein project memorymanagement:
.
****
Boinc: 7.24.1 (x64); Win 10 Pro (x64)


How many cpu's does your pc have in it? It shows MW using 4 cpu's are you using an app_config.xml file to limit it? Also what do you have for the setting in the Boinc Manager under Options, computing preferences for 'when computer is in use' and 'when computer is not in use'. i use an app_config.xml file to limit each task to 2 cpu's and they just run non stop with no problems. I also have Boinc set to NOT suspend when pc is in use and to NOT stop when 'Boinc cpu usage is above'. i also unchecked the box to 'suspend when mouse or keyboard input in last ___ minutes'. In short on my pc's Boinc and MilkyWay runs 24/7/365, yes I also run other Projects at the same time, I limit the total number of tasks MW can run at one time in the same app_config.xml file.
22) Questions and Answers : Windows : keep geting kicked off and not getting credits (Message 76829)
Posted 28 Jan 2024 by Profile mikey
Post:
I just restarted doing this project after taking a break from it. I got credit for about a week then kept getting zero credits a day. When I check Boinc the project keeps getting delisted from the projects I have going. I re-add it and it starts to run and the next day I have no credit and the project is off my project list again. Looking at my stats here it shows 225 work units "completed validation inclusive" These are all on n-body simulation units. How do I fix this?


I'll start with your last question first...validation inconclusive is MilkyWay's way of saying you are waiting on your wingman to finish their task before you get your credits. They generated ALOT of tasks by accident a week ago and all wingman tasks go to the end of the list so give it another week or so and most/all of your tasks should start getting the credits they are owed.

As for why it keeps getting delisted...are you adding the project manually or selecting it from the list in the Boinc Manager under Tools, add project? Because they went thru a couple of name changes and if you use the one on the list it's the new Official name.
23) Message boards : News : Admin Updates Discussion (Message 76828)
Posted 28 Jan 2024 by Profile mikey
Post:
All my Separation tasks are gone. *thumbsup*


mine too WOO HOO!!!
24) Message boards : Number crunching : Option in project preferences to set max CPUs (Message 76826)
Posted 27 Jan 2024 by Profile mikey
Post:
Thanks, I've set mine to also run multiple lower CPU count WUs. Is there any performance increase you see for doing this?


No I do it because MilkyWay isn't my prime focus right now and I can adjust the tasks up and down easily and quickly depending on when my other projects have the tasks I want.
25) Message boards : Number crunching : Tasks Completed, but validation tasks remain Unsent (Message 76823)
Posted 27 Jan 2024 by Profile mikey
Post:
MW only sends out the original task then if it needs a wingman task it generates it
both WUs say
minimum quorum	        1
initial replication	2 

...not sure what's the difference between quorum and replication, but quite obviously a send task IS needed to complete validation.

I've not heard the term 'wingman' task, but the task to complete the validation for BOTH WUs had already been generated, but neither has been sent to be run ... both have status Unsent.


No a wingman is not always needed, apparently if you return I think the number is 10 tasks in a row that are valid then the Server thinks your pc is trustworthy and it will only periodically send out a wingman task for that pc. BUT as soon as your wingman proves your pc is not trustworthy anymore then the process starts all over from zero again. Becoming non trustworthy can be from dust, overclocking, components wearing out etc etc.

Link tried to explain WHY they haven't been sent out yet, the Server made a million tasks and all wingman tasks go at the end of the list, so in a couple of weeks we should be getting ALOT of _1 tasks, the initial tasks end in _0 and then everyone tasks should be valid or the Project will send out a 3rd task to try and figure out which of the first 2 pc's has the right answer.
26) Message boards : Number crunching : Option in project preferences to set max CPUs (Message 76822)
Posted 27 Jan 2024 by Profile mikey
Post:
Other DC (boinc) projects [e.g., Amicable numbers] which potentially tie up all the CPUs on one's PC have a mechanism where the max number of CPUs that can be used by any task is set by the user... and this is a straight-forward option within the Project preferences without any monkeying around:
Preferences for this project	Amicable Numbers preferences
                         ^ You can set a limit on the number of CPU cores used

LLP


I use this app_config.xml file to make mine only use 2 cpu's per task

<app_config>


<app_version>
<app_name>milkyway_nbody</app_name>
<plan_class>mt</plan_class>
<avg_ncpus>2</avg_ncpus>
<cmdline>--nthreads 2</cmdline>
</app_version>

<project_max_concurrent>2</project_max_concurrent>

</app_config>

It tells the pc to only use 2 cpu's per task and then to only run 2 tasks max, I adjust the last line based on the pc it's on, this one is from my laptop but my 5950X is running 15 tasks at a time when it runs them.
27) Message boards : News : Admin Updates Discussion (Message 76813)
Posted 26 Jan 2024 by Profile mikey
Post:
This figure is fluctuating around a somewhat constant level.
Yes, and it should start to drop once we are through the pile of _0 tasks and start processing the resends. Until than (on average) we report one task, we get a replacement, and for the reported task a resend task is created, so the amount of ready to send tasks is pretty constant.


Personally I wish they could insert the _1 task at the beginning of the list so people aren't waiting as long, either that or just generate both the _0 and _1 task at the same time and then delay the _1 task by 1 or 2 days in case it's not needed. Then if it's not needed after it's sent out just delete it from the Server side so that it deletes it from us crunchers too.
28) Message boards : Number crunching : Tasks Completed, but validation tasks remain Unsent (Message 76809)
Posted 26 Jan 2024 by Profile mikey
Post:
Two of my tasks are Completed, validation inconclusive:

https://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=963991857
24 Jan 2024, 4:42:42 UTC CPU time (sec) 43,471.86 or over 12 hours
yet Task 936324031 is still Unsent
and
https://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=964000770
23 Jan 2024, 22:51:35 UTC CPU time (sec) 54,474.03 or over 15 hours
yet Task 936319099 is still Unsent

Not running any more tasks for now.


Validation Inconclusive just means 'waiting for a wingman', keep crunching and they will get validated in the end. MW only sends out the original task then if it needs a wingman task it generates it but it goes at the end of the list of available tasks, so they can take awhile to validate.
29) Message boards : Number crunching : 300+ n body tasks validated with no credit (Message 76808)
Posted 26 Jan 2024 by Profile mikey
Post:
In general there's nothing wrong with planned shutdowns for server maintenance, they are necessary, every project has them on more or less regular basis and since they are planned, everyone can adapt to them. No idea about Cosmology, never crunched for them, but well, like I said, most projects are worse than Milkyway and have REAL issues.


Yes that's true!!

Cosmology's problem is lack of an Admin having the time or willingness to do what's needed to keep it going and also to let anyone else help them. It's gotten so bad that you can't even log into the website without a coding workaround using your authenticator, which of course people like you who have no existing account have or those who don't keep a copy of the project xml files in the Boinc directory someplace safe, I keep mine on a couple of usb sticks and of course on each pc but those with just one pc can be out of luck trying to get tasks or even to log in and use the forums.

As for MilkyWay itself I am having zero problems crunching tasks using my app_config.xml file that limits it to 2 cpu cores per task, then I adjust the total number of tasks I want to run in the same app_config.xml file based on the pc I'm running the tasks on.
State: All (518) · In progress (142) · Validation pending (0) · Validation inconclusive (324) · Valid (50) · Invalid (2) · Error (0)

I am currently using MilkyWay as a backup mostly zero resource share project until I reach some goals at other projects that have a highly variable number and types of tasks available.
30) Message boards : Number crunching : 300+ n body tasks validated with no credit (Message 76802)
Posted 24 Jan 2024 by Profile mikey
Post:
Most problematic BOINC project? You don't seem to have much experience with problematic projects. That huge pile of ready to send tasks is perhaps not optimal, but that's all, not a real issue and seems to be fixed as we are down below 1 million now. SETI was a problematic project with it's permanently overloaded servers, WCG is a problematic project since Krembil runs it. But Milkyway? Sure, I'm not going to say everything is perfect, it's not, but at least there's a steady WU supply and the communication from the project staff improved recently, so we know what's going on. It would be really great, if Milkyway was the most problematic BOINC project and all other were better. Unfortunately most of them are worse and only very few are actually better.


Don't forget Seti had a planned shutdown every Tuesday as well and don't forget about Cosmology either, it's been years now and they STILL don't have their certificate setup right!!
31) Message boards : News : Admin Updates Discussion (Message 76790)
Posted 21 Jan 2024 by Profile mikey
Post:
... but that's nothing we can change by not crunching for Milkyway anyway, only project admin can fix it).


Well, by not crunching we can express our disappointment (and motivate project admin to fix the issue).


You are welcome to do what you like but they are making a MILLION tasks for us to crunch, I don't think some people quitting crunching is something they are going to notice, with a MILLION tasks they most certainly have very long term goals.
32) Message boards : Number crunching : 300+ n body tasks validated with no credit (Message 76783)
Posted 21 Jan 2024 by Profile mikey
Post:
I'm severely tempted to stop crunching until this gets resolved.....

Any ideas?
It looks like the wing men also completed validation with no credit.


Supposedly as the Wingmen finish their tasks, as good ones, the Validator will assign credit to both people and yours and mine and everyone elses tasks will get the credits they should. Part of the problem is that MW generated a BOATLOAD of tasks and they always add the Wingman task at the END of the queue it could be awhile before it all sorts itself out. Read the Admin thread for a better explanation.
33) Message boards : News : Admin Updates Discussion (Message 76774)
Posted 20 Jan 2024 by Profile mikey
Post:
same issue here all of yesterday and today WUs validated with 0.00 credits


Mine too but there is a new thing I THINK, it now says "initial replication 2"

name de_nbody_11_02_2023_v183_pal5__data__3_1705426859_64425
application Milkyway@home N-Body Simulation
created 16 Jan 2024, 17:51:23 UTC
minimum quorum 1
initial replication 2

932928572 857711 19 Jan 2024, 7:50:31 UTC 19 Jan 2024, 18:27:19 UTC Completed and validated 24,333.81 45,806.73 0.00 Milkyway@home N-Body Simulation v1.83 (mt)
windows_x86_64
34) Message boards : News : Admin Updates Discussion (Message 76766)
Posted 19 Jan 2024 by Profile mikey
Post:
I guess you need to cancel separation tasks "waiting for assimilation" now, so they can finally be removed from our results lists.

Regarding one other change that appeared after the server maintenance: is ~3 million ready to send N-Body tasks the new target for the work generators? With that it will take up to two weeks before the _1 and any additional resend tasks make it through that pile (when we had 1.5 millions they needed around 5-7 days).

And since N-Body seem to always need two results to validate, wouldn't in make sense to set minimum quorum and with that initial replication to 2 ? Or are there any WUs, that validate with only one result?


I am not sure why 3 million workunits were generated. The cap was set pretty low to 1000 (now 10,000) but it just ignored that which I still haven't found the reason why. I will try to go in and remove/cancel these workunits so validations can be done.[/quote]

I obvioussly have no clue how intimate you are with the Boinc Server side code but apparently things are in SEVERAL places instead of just one in the coding. ie one admin at a different project tried to change the credits given out for a task and found they were hard coded in at least 3 different sections, I'm NOT asking for a credit award change I'm just using it as an example.

Also don't know if you know it but there is a Boinc Admin email group that you can ask questions from other Boinc Project Admins that may have already been thru what you are, assuming now, still learning about.
35) Message boards : News : Commenting on Recent Issues with the server (Message 76761)
Posted 17 Jan 2024 by Profile mikey
Post:
I'm 99,9% sure that those are Separation tasks, that got stuck in the pipline, because of how they "finished" it. Also the pretty constant number of workunits waiting for validation, always around 460, is mainly showing Separation WUs I guess.


That makes sense, thanks!!
36) Questions and Answers : Windows : Something is wrong with N-Body Simulation: it counts units of 3538 gigaflops endlessly without progress (Message 76755)
Posted 12 Jan 2024 by Profile mikey
Post:
If it's a purchased system then the fan on the cpu is not the best of the best to keep the price where it was when you bought it, if you don't do things like that yourself talk to a local computer shop about which fan would be a better choice for your system. If you do do things like that yourself then a good all-in-one water cooled system should drop the temps 5 degrees with no problem, if you need to go further than a full blown water cooled system can drop them 10 degrees with no problem.

An easy answer to the heat is to take off the side of the pc and let the heat out or you can even blow a fan into the now open side. But I know that's not always possible with life going on around us.
The Intel Core i7-1255U is a laptop CPU, so likely there's not much that can be done about the cooling.

Also in most ready build PCs the CPU cooler isn't great and often not enough for running BOINC, but the main issue is usually the more or less the complete lack of adequate air flow. Usually this can only be fixed with a new computer case unless you are sure you have a good one with just not enough fans in it. This should always be the first step, even the best cooler won't help if the heat is accumulating inside the case. This is in particular important when using the PC for things like BOINC. While today's CPUs will simply throttle to protect themselves, there are many other parts running hot in case with no air flow and some of those parts do not have the possibility to protect themselves from overheating.


All 3 of my laptops have a cooling system underneath the laptop blowing air up into it to help more air flow go thru it. And yes the only way to handle a self throttling laptop is to use less cores.
37) Questions and Answers : Windows : Something is wrong with N-Body Simulation: it counts units of 3538 gigaflops endlessly without progress (Message 76752)
Posted 11 Jan 2024 by Profile mikey
Post:
As far as I can see, the program has really begun to work sustainably. Apparently, it is the frequent stops that drive her into closed loops.
Thanks for the advice!

But in general, this is a big drawback of the program that it's work is not sustainable enough and it couldn't not prevent processor overheating.


If it's a purchased system then the fan on the cpu is not the best of the best to keep the price where it was when you bought it, if you don't do things like that yourself talk to a local computer shop about which fan would be a better choice for your system. If you do do things like that yourself then a good all-in-one water cooled system should drop the temps 5 degrees with no problem, if you need to go further than a full blown water cooled system can drop them 10 degrees with no problem.

An easy answer to the heat is to take off the side of the pc and let the heat out or you can even blow a fan into the now open side. But I know that's not always possible with life going on around us.
38) Message boards : News : Commenting on Recent Issues with the server (Message 76742)
Posted 3 Jan 2024 by Profile mikey
Post:
I would like to ask about the numbers on the Server Status page...up in the right corner of the page it says this:

Computing status
Work
Tasks ready to send 1432459
Tasks in progress 101521

BUT in the bottom left corner of the page it says this:

Tasks by application
Application Unsent In progress
Milkyway@home N-Body Simulation 1001 45056

WHY is there over a million task difference in the number of task ready to send? And an over 50k difference in the tasks in progress?

Why aren't the two sets of numbers drawing from the same data so they are the same? I'm guessing some are still reporting the long gone now gpu tasks but WHY?
39) Questions and Answers : Windows : No new tasks in a long time, RAC is below 1,000 (Message 76735)
Posted 27 Dec 2023 by Profile mikey
Post:
It's almost like the system refuses to give me new work since 6 tasks for some odd reason did not get started by the deadline (downloaded dec 3 and supposed to be reported by the 15th). But these tasks have not gone well for the others. 2 random sampled tasks have validation inconclusive. One, the second in line had a computation error (file not found on linux).

I have reset the project, expanded the days from .3 to .5 and it refuses to send work.
The other projects then take over.

Is it this project or is it BOINC or whats going on?


Some Projects will punish you for what happened but I don't think MilkyWay is one of them, try the No New Tasks, raising the cache size a little bit, then reversing the process again and see if that works for you again. What happens to me at other projects is that if I set my cache size that low I get tasks from project a but no tasks at all from project b or c so I have to go thru that process too. ORRR I just suspend the other Projects which in some cases gives me tasks from the other projects but not always, the no new tasks process work all the time for me.
40) Questions and Answers : Windows : No new tasks in a long time, RAC is below 1,000 (Message 76731)
Posted 26 Dec 2023 by Profile mikey
Post:
It's almost like the system refuses to give me new work since 6 tasks for some odd reason did not get started by the deadline (downloaded dec 3 and supposed to be reported by the 15th). But these tasks have not gone well for the others. 2 random sampled tasks have validation inconclusive. One, the second in line had a computation error (file not found on linux).

I have reset the project, expanded the days from .3 to .5 and it refuses to send work.
The other projects then take over.

Is it this project or is it BOINC or whats going on?


Set the other projects to no new tasks and then up your cache to a full day or even 2 or 3 days, then ask MilkyWay for work, then once you get it remember to set your cache back where you like it and set your other projects to get tasks again. MilkyWay task can think they will take a long time and 0.3 or even 0.5 of a day may not be enough of a cache size.



As always Mikey, you have the answer!
Thanks and have a good holiday.


I'm glad it worked for you!!
Merry Christmas and a prosperous New Year to you as well!!


Previous 20 · Next 20

©2024 Astroinformatics Group