Welcome to MilkyWay@home

Server Maintenance 12:00 PM ET (16:00 UTC) 9/23/2022

Message boards : News : Server Maintenance 12:00 PM ET (16:00 UTC) 9/23/2022
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Profile Tom Donlon
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 408
Credit: 120,203,200
RAC: 0
Message 74350 - Posted: 4 Oct 2022, 14:55:50 UTC
Last modified: 4 Oct 2022, 14:56:07 UTC

think the real reason is that if you want to run Separation only on the GPU, you have to turn off "Use CPU". Then, you of course can't run the N-Body.
You need to allow running the Separation GPU work units only, without the CPU Separation work units, while still allowing N-Body to run on the CPU.

(Don't someone tell me to use two BOINC instances. I have been doing that for ages. Not many people will.)


I'd love to make this change to the project. It's something that I've put on the list for the new project devs to look at down the line. It's frustrating that we force our users to have to run 2 configured instances of the client in order to effectively use their machines.

The latest Nbody Simulation tasks are taking a lot of resources, over two hours across 8 CPU’s 16 hours plus cpu time. They were previously taking 4-6 minutes when I did them before. Can see why people are reluctant to run them.


I am working on a response to that issue in this thread: [url] https://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4924#74181 [/url]. I also talk about it a little bit below.

In response to long running tasks.... FYI, over the years, it seams that when a new task group starts, i.e. the ones with the lowest numbers (like below 1000000) in the task name just before the last _ (underscore) will have long run times. The higher the number the shorter the run time. So as we crunch through the low numbers, the run times typically and historically become shorter until the next group hits, and the next group of 3 sequence starts. And but however, this often takes many weeks.

As a speculative guess, and I suppose it would have to be the milky-way staff that would have to reply, I would think the low numbers start off as a point - or pixel close to the core of the milky-way, and then as the number increment, the referenced position advance outward. The further from the center, there would be lesser effects on that position, meaning faster crunch time?

I would not hold my breath for any staff response, as I rarely see any staff from any site reply with additional information. In the mean time... we can speculate.


The way that the simulation and optimization works, there is no preference to place the dwarf galaxy at a specific point in the Milky Way. There are combinations of parameters (such as very dense dwarf galaxies) that cause the simulation to run for a long time. This is usually because the timestep resolution that you need to accurately simulate those systems is very small, so the simulation may choose to run 10,000 timesteps for very dense systems, but only 1,000 timesteps for a less dense system. Timesteps all take roughly the same amount of time to run, so in this example that would be a 10x increase in the time it would take to crunch that simulation.

Over time, these very dense systems should be ruled out (I say should... it appears they may not be ruled out in all cases) and you will only get simulations with the reasonable less dense dwarf galaxies, that don't take very long to run. So your average runtime goes down.
ID: 74350 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Speedy51

Send message
Joined: 12 Jun 10
Posts: 57
Credit: 6,163,587
RAC: 156
Message 74354 - Posted: 4 Oct 2022, 20:22:33 UTC - in response to Message 74347.  


I contribute to various projects but sometimes focus on one at a time also. I assume that's what you're doing too. I'd just hope that people wouldn't stop contributing because project is experiencing some difficulties.

Yes I am doing the same as you in regards to currently focusing on a particular project. I agree I hope people don't stop contributing just because the project is experiencing some difficulties. I will do the best to empty my cache by the deadline
ID: 74354 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
DaveH52

Send message
Joined: 22 Apr 10
Posts: 3
Credit: 804,800
RAC: 0
Message 74360 - Posted: 6 Oct 2022, 5:34:36 UTC - in response to Message 74354.  

I've had a bunch of tasks, usually 8 CPU de_nbody tasks that should take about 9 minutes, but the longer they run, the longer they have left to run, and aren't using any CPU time, finally they never finish and throw a "computation error" message. On the other hand, I've had a bunch that don't pause when I pause BOINC. BOINC manager shows them paused, but TaskManager shows they're still running under VBox, even if I quit BOINC completely. A reboot finally kills it.
I've just suspended Milkyway because I have anooher task that's been running for hours, but not making progress. The longer I've let it run the more time remains. before it will finish. At that rate it never will.
DaveH52
ID: 74360 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3315
Credit: 519,940,047
RAC: 22,627
Message 74361 - Posted: 6 Oct 2022, 11:08:56 UTC - in response to Message 74360.  

I've had a bunch of tasks, usually 8 CPU de_nbody tasks that should take about 9 minutes, but the longer they run, the longer they have left to run, and aren't using any CPU time, finally they never finish and throw a "computation error" message. On the other hand, I've had a bunch that don't pause when I pause BOINC. BOINC manager shows them paused, but TaskManager shows they're still running under VBox, even if I quit BOINC completely. A reboot finally kills it.
I've just suspended Milkyway because I have anooher task that's been running for hours, but not making progress. The longer I've let it run the more time remains. before it will finish. At that rate it never will.
DaveH52


Your pc is hidden so I can't tell is your pc an 8 core pc and you are letting the nbody tasks use every cpu core you have? If so you should try an app_config file to reduce that to say 4 cpu's, the problem seems to be the task is waiting for the pc to do something but with all the cpu cores tied up by the task it either can't happen or it takes forever to happen.
ID: 74361 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jamie2

Send message
Joined: 23 Jul 22
Posts: 2
Credit: 33,230,529
RAC: 0
Message 74372 - Posted: 6 Oct 2022, 21:40:15 UTC - in response to Message 74244.  

Notice still displayed in Bionic Manager even though its outdated.
ID: 74372 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jamie2

Send message
Joined: 23 Jul 22
Posts: 2
Credit: 33,230,529
RAC: 0
Message 74374 - Posted: 6 Oct 2022, 21:40:26 UTC - in response to Message 74244.  

Notice still displayed in Bionic Manager even though its outdated.
ID: 74374 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3315
Credit: 519,940,047
RAC: 22,627
Message 74376 - Posted: 7 Oct 2022, 10:39:09 UTC - in response to Message 74372.  

Notice still displayed in Bionic Manager even though its outdated.


What notice is that?
ID: 74376 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
San-Fernando-Valley

Send message
Joined: 13 Apr 17
Posts: 256
Credit: 604,411,638
RAC: 0
Message 74377 - Posted: 7 Oct 2022, 13:44:27 UTC - in response to Message 74375.  

Notice still displayed in Bionic Manager even though its outdated.

What notice - I can't see a notice in the post you have refered to.
ID: 74377 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mike

Send message
Joined: 4 Oct 20
Posts: 1
Credit: 26,537,330
RAC: 13,297
Message 74382 - Posted: 8 Oct 2022, 11:41:01 UTC - in response to Message 74377.  

I think they are talking about the Server Maintenance message from 9/23 that still pops up in BOINC Manager.
ID: 74382 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
San-Fernando-Valley

Send message
Joined: 13 Apr 17
Posts: 256
Credit: 604,411,638
RAC: 0
Message 74384 - Posted: 8 Oct 2022, 15:15:27 UTC - in response to Message 74382.  

You are right, I sort of mistunderstood it.
Thanks.
ID: 74384 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Tom Donlon
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 408
Credit: 120,203,200
RAC: 0
Message 74418 - Posted: 11 Oct 2022, 15:57:09 UTC

I've turned off exporting of this notice, hopefully that fixes it. Sometimes that can get stuck or keep showing up if the threads are busy (I still don't know if that's intended behavior from BOINC or if it's a bug).
ID: 74418 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Frank

Send message
Joined: 2 Nov 10
Posts: 25
Credit: 1,894,269,109
RAC: 0
Message 74445 - Posted: 14 Oct 2022, 15:14:07 UTC

I am certainly relieved that the worrisome Notice won't be showing up on my computer any longer. However, I am concerned that the Project may be crashing. Task completions are down 40% and are falling.
You can't look at any of the various elements of the process and say that it is working well. Task creation, Task distribution, Task execution, Task completion. Task validation and Task error detection are seriously flawed.?
If we can't get back on course we are going to crash. What I need to know is, are we going to correct our course and get this turkey under control or load the life boats?
ID: 74445 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Tom Donlon
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 408
Credit: 120,203,200
RAC: 0
Message 74446 - Posted: 14 Oct 2022, 15:24:48 UTC

I think I'm cursed so that whenever I travel, the server begins to time out. I was on a plane most of yesterday so I wasn't looking at milkyway, and then I got an email this morning telling me it had downages in the middle of the night.

This morning the server seemed to be running just fine again, but I restarted some processes and flushed the DB just in case. It all seems fine on my end, and the numbers look like they're improving.

We're very close to breaking through that 1k nbody task waiting limit at which point a these validation waiting WUs should begin to clear out.
ID: 74446 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile HRFMguy

Send message
Joined: 12 Nov 21
Posts: 236
Credit: 575,027,893
RAC: 38,188
Message 74447 - Posted: 14 Oct 2022, 16:20:58 UTC - in response to Message 74446.  

I think I'm cursed so that whenever I travel, the server begins to time out. I was on a plane most of yesterday so I wasn't looking at milkyway, and then I got an email this morning telling me it had downages in the middle of the night.

This morning the server seemed to be running just fine again, but I restarted some processes and flushed the DB just in case. It all seems fine on my end, and the numbers look like they're improving.

We're very close to breaking through that 1k nbody task waiting limit at which point a these validation waiting WUs should begin to clear out.
Can you run a script once per week to do what you just did? Would that help?
ID: 74447 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Speedy51

Send message
Joined: 12 Jun 10
Posts: 57
Credit: 6,163,587
RAC: 156
Message 74448 - Posted: 14 Oct 2022, 21:11:17 UTC - in response to Message 74446.  
Last modified: 14 Oct 2022, 21:12:08 UTC


We're very close to breaking through that 1k nbody task waiting limit at which point a these validation waiting WUs should begin to clear out.

Thanks for keeping us up-to-date Tom, hope your trip went well. To help lower pending validations quicker would it be helpful to focus/process more N body tasks?
ID: 74448 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
johndad5

Send message
Joined: 29 Nov 10
Posts: 3
Credit: 23,839,804
RAC: 0
Message 74449 - Posted: 15 Oct 2022, 15:28:43 UTC

Hello,

Is what I am reading here why I am having such a large percentage of tasks failing because task was not started by deadline?
ID: 74449 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Speedy51

Send message
Joined: 12 Jun 10
Posts: 57
Credit: 6,163,587
RAC: 156
Message 74454 - Posted: 15 Oct 2022, 21:17:15 UTC - in response to Message 74449.  

Hello,

Is what I am reading here why I am having such a large percentage of tasks failing because task was not started by deadline?

No it's not the reason why you are having such large percentage of tasks not started before deadline. Reason why you are having this happen is because your computer is not able to process the work before the deadline. This is not your fault you just have too many tasks in progress.
I would suggest setting "no new tasks" in your "project tab"
ID: 74454 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
johndad5

Send message
Joined: 29 Nov 10
Posts: 3
Credit: 23,839,804
RAC: 0
Message 74457 - Posted: 16 Oct 2022, 4:52:32 UTC - in response to Message 74454.  

Is there a way I can set the project so I don't have this issue? The No New Task setting is only a temporary fix. This issue just cropped on recently also which makes me think there has been a change.
ID: 74457 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
San-Fernando-Valley

Send message
Joined: 13 Apr 17
Posts: 256
Credit: 604,411,638
RAC: 0
Message 74458 - Posted: 16 Oct 2022, 5:24:10 UTC - in response to Message 74457.  

Is there a way I can set the project so I don't have this issue? The No New Task setting is only a temporary fix. This issue just cropped on recently also which makes me think there has been a change.

Reduce the amount of tasks in your queue.
Under "Options" --> "Computing preferences" --> "Store at least X days of work"
and "Store up to an additional X days of work".
Start with 0.1 and 0.1 respectively.
Then work your way up - till you get just few task in your queue.
That way you avoid running into deadlines.

Tasks are running much longer than previously.
Sometimes up to 24 hours and more - especially on "slow" PCs.

Hope this helps ...
ID: 74458 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
johndad5

Send message
Joined: 29 Nov 10
Posts: 3
Credit: 23,839,804
RAC: 0
Message 74466 - Posted: 16 Oct 2022, 14:31:24 UTC - in response to Message 74458.  

Thank you very much! I will give it a try.
ID: 74466 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : News : Server Maintenance 12:00 PM ET (16:00 UTC) 9/23/2022

©2024 Astroinformatics Group