Welcome to MilkyWay@home

Server Maintenance 12:00 PM ET (16:00 UTC) 9/23/2022


Advanced search

Message boards : News : Server Maintenance 12:00 PM ET (16:00 UTC) 9/23/2022
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
ProfileTom Donlon
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 372
Credit: 97,536,112
RAC: 133,707
50 million credit badge3 year member badge
Message 74303 - Posted: 28 Sep 2022, 16:05:34 UTC

The server slowness recently was due to us copying data over from this server to the new machine. That's all done now and we should notice an increase in speed again.

I'm not exactly sure why the validation waiting count keeps climbing, but yeah it might have something to do with the sleep call in the validator code. Taking a look at why that is there is on our to-do list when we recompile binaries for the new server.

I think there are just fewer people who work on Nbody because it doesn't have a GPU application (so it's much less efficient in terms of credit), so it takes longer to crunch through the backlog. It's really not a problem as far as I can tell.
ID: 74303 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Speedy51

Send message
Joined: 12 Jun 10
Posts: 27
Credit: 4,400,473
RAC: 3,376
3 million credit badge12 year member badge
Message 74308 - Posted: 29 Sep 2022, 1:59:23 UTC - in response to Message 74301.  
Last modified: 29 Sep 2022, 2:06:02 UTC

Thanks Mikey you for your input. I completely agree with what you are saying. I have processed all of my _3 & _2 now I am just working through my remaining 84 _1 tasks. Sometimes you get lucky with_1 tasks in the fact that they validate as soon as you return them :-) here is an example
It is really handy when you get a task that runs for seconds it helps empty your list of tasks quicker. N body tasks waiting to be sent are slowly dropping this is nice to see as I write there is 115991 queued waiting to be sent

Tom thanks for all of the hard work and keeping us updated with things that are happening
ID: 74308 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 9 Jul 17
Posts: 100
Credit: 16,967,906
RAC: 368
10 million credit badge5 year member badge
Message 74309 - Posted: 29 Sep 2022, 9:24:50 UTC - in response to Message 74303.  
Last modified: 29 Sep 2022, 9:27:11 UTC

I think there are just fewer people who work on Nbody because it doesn't have a GPU application (so it's much less efficient in terms of credit), so it takes longer to crunch through the backlog. It's really not a problem as far as I can tell.

I think the real reason is that if you want to run Separation only on the GPU, you have to turn off "Use CPU". Then, you of course can't run the N-Body.
You need to allow running the Separation GPU work units only, without the CPU Separation work units, while still allowing N-Body to run on the CPU.

(Don't someone tell me to use two BOINC instances. I have been doing that for ages. Not many people will.)
ID: 74309 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilemikey
Avatar

Send message
Joined: 8 May 09
Posts: 2984
Credit: 497,665,228
RAC: 193,990
300 million credit badge13 year member badgeextraordinary contributions badge
Message 74310 - Posted: 29 Sep 2022, 9:59:39 UTC - in response to Message 74309.  

I think there are just fewer people who work on Nbody because it doesn't have a GPU application (so it's much less efficient in terms of credit), so it takes longer to crunch through the backlog. It's really not a problem as far as I can tell.

I think the real reason is that if you want to run Separation only on the GPU, you have to turn off "Use CPU". Then, you of course can't run the N-Body.
You need to allow running the Separation GPU work units only, without the CPU Separation work units, while still allowing N-Body to run on the CPU.


YES PLEASE!!

(Don't someone tell me to use two BOINC instances. I have been doing that for ages. Not many people will.)


What I do is run this pc using gpu tasks and that pc running cpu tasks and do it the other way at other projects. Right now I'm mostly running cpu Separation tasks but that's because I need hours for another wuprop badge, I have a couple of gpu's that won't work elsewhere so they too are doing Separation tasks.
ID: 74310 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Septimus

Send message
Joined: 8 Nov 11
Posts: 186
Credit: 2,407,443
RAC: 4,791
2 million credit badge11 year member badge
Message 74313 - Posted: 29 Sep 2022, 12:34:32 UTC - in response to Message 74310.  
Last modified: 29 Sep 2022, 12:35:02 UTC

The latest Nbody Simulation tasks are taking a lot of resources, over two hours across 8 CPU’s 16 hours plus cpu time. They were previously taking 4-6 minutes when I did them before. Can see why people are reluctant to run them.
ID: 74313 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilemikey
Avatar

Send message
Joined: 8 May 09
Posts: 2984
Credit: 497,665,228
RAC: 193,990
300 million credit badge13 year member badgeextraordinary contributions badge
Message 74314 - Posted: 29 Sep 2022, 18:11:57 UTC - in response to Message 74313.  

The latest Nbody Simulation tasks are taking a lot of resources, over two hours across 8 CPU’s 16 hours plus cpu time. They were previously taking 4-6 minutes when I did them before. Can see why people are reluctant to run them.


Are you using all of your cpu cores, ie 8, to run the tasks? If so try running then using only 7 cpu cores or even 6 and see if that helps, sometimes OS's get overwhelmed when all the cpu cores are running the exact same task here at MilkyWay.
ID: 74314 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Septimus

Send message
Joined: 8 Nov 11
Posts: 186
Credit: 2,407,443
RAC: 4,791
2 million credit badge11 year member badge
Message 74315 - Posted: 29 Sep 2022, 19:45:34 UTC - in response to Message 74314.  

Thanks Mikey…..I was only using 8 out of 16 cores system sometimes r showed 60% usage ..seems to be a batch of long simulations. Maybe try a reduction down to 6 or 4..
ID: 74315 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Speedy51

Send message
Joined: 12 Jun 10
Posts: 27
Credit: 4,400,473
RAC: 3,376
3 million credit badge12 year member badge
Message 74318 - Posted: 29 Sep 2022, 20:51:32 UTC - in response to Message 74314.  

I wish I could use my other 7 cores leaving one free for the GPU, seems to me will only use a maximum of 16 even with the following in app_config.xml in the project folder
<app_config>
<app>
<name>milkyway</name>
<fraction_done_exact/>
 <CPU_version>
<cpu_usage>23</cpu_usage>
</CPU_version>
</app>
</app_config>
ID: 74318 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Skillz

Send message
Joined: 28 May 17
Posts: 21
Credit: 1,870,361,266
RAC: 17,978,835
1 billion credit badge5 year member badge
Message 74327 - Posted: 30 Sep 2022, 20:41:42 UTC - in response to Message 74318.  

I wish I could use my other 7 cores leaving one free for the GPU, seems to me will only use a maximum of 16 even with the following in app_config.xml in the project folder
<app_config>
<app>
<name>milkyway</name>
<fraction_done_exact/>
 <CPU_version>
<cpu_usage>23</cpu_usage>
</CPU_version>
</app>
</app_config>


The easiest way to get this done would be running two BOINC instances on the same host. Set one for CPU only work, then set the CPU settings in the BOINCManager to only use 23 cores (It'll be a % of your total cores) and the other one set to run GPU only.
ID: 74327 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Speedy51

Send message
Joined: 12 Jun 10
Posts: 27
Credit: 4,400,473
RAC: 3,376
3 million credit badge12 year member badge
Message 74328 - Posted: 30 Sep 2022, 21:15:31 UTC - in response to Message 74327.  


The easiest way to get this done would be running two BOINC instances on the same host. Set one for CPU only work, then set the CPU settings in the BOINCManager to only use 23 cores (It'll be a % of your total cores) and the other one set to run GPU only.

Currently MilkyWay is the only project running & "% of total cores" is set "100%" and it is still only saying it is using "16 CPUs" under "task manager" "CPU %" is between "55 and 56" so I am not sure this will work. Are you running 2 boinc instances on the same drive?
ID: 74328 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jdzukley

Send message
Joined: 26 May 11
Posts: 32
Credit: 42,211,108
RAC: 15,990
30 million credit badge11 year member badge
Message 74329 - Posted: 1 Oct 2022, 0:47:23 UTC - in response to Message 74314.  
Last modified: 1 Oct 2022, 1:01:55 UTC

In response to long running tasks.... FYI, over the years, it seams that when a new task group starts, i.e. the ones with the lowest numbers (like below 1000000) in the task name just before the last _ (underscore) will have long run times. The higher the number the shorter the run time. So as we crunch through the low numbers, the run times typically and historically become shorter until the next group hits, and the next group of 3 sequence starts. And but however, this often takes many weeks.

As a speculative guess, and I suppose it would have to be the milky-way staff that would have to reply, I would think the low numbers start off as a point - or pixel close to the core of the milky-way, and then as the number increment, the referenced position advance outward. The further from the center, there would be lesser effects on that position, meaning faster crunch time?

I would not hold my breath for any staff response, as I rarely see any staff from any site reply with additional information. In the mean time... we can speculate.
ID: 74329 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 13 Oct 21
Posts: 43
Credit: 60,932,436
RAC: 374,467
50 million credit badge1 year member badge
Message 74332 - Posted: 1 Oct 2022, 23:11:51 UTC - in response to Message 74318.  
Last modified: 1 Oct 2022, 23:14:16 UTC

N-Body can only use a maximum of 16 cores per task so you'll have to reduce the number of cores per task and run multiple tasks simultaneously. Unless modified, it'll by default use all available cores up to 16. I know some people do it but I've never worried about leaving a core free for GPU. It doesn't seem to make a difference although I've never done a more detailed test.

Additionally, your app_config looks incorrect and is likely ignored. Check here for the correct format and syntax of the file: https://boinc.berkeley.edu/wiki/Client_configuration#Project-level_configuration. If you're just trying to modify N-Body, "milkyway" is not the right name and you'd need to use the app_version section of app_config.
ID: 74332 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Speedy51

Send message
Joined: 12 Jun 10
Posts: 27
Credit: 4,400,473
RAC: 3,376
3 million credit badge12 year member badge
Message 74333 - Posted: 2 Oct 2022, 6:29:24 UTC - in response to Message 74332.  

Thank you for the information. Using only 8 cores seems to have decreased the runtime, however I am still only running 1 task at a time not sure what to do to increase it to 2
ID: 74333 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilemikey
Avatar

Send message
Joined: 8 May 09
Posts: 2984
Credit: 497,665,228
RAC: 193,990
300 million credit badge13 year member badgeextraordinary contributions badge
Message 74334 - Posted: 2 Oct 2022, 11:12:32 UTC - in response to Message 74333.  

Thank you for the information. Using only 8 cores seems to have decreased the runtime, however I am still only running 1 task at a time not sure what to do to increase it to 2


You add something like this to you app_config.xml file inside the MilkyWay Project folder:

<app_version>
<app_name>milkyway_nbody</app_name>
<max_concurrent>1</max_concurrent>
<plan_class>mt</plan_class>
<avg_ncpus>2</avg_ncpus>
<cmdline>--nthreads 2</cmdline>
</app_version>

By changing the <max_concurrent> number to 2 and then changing the <avg_ncpus> to 8 it should run 2 tasks at a time using 8 cpu cores for each task, be sure to use Notepad in Windows or a text editor in Linux as Word Processing programs add hidden stuff that will cause Boinc to ignore the whole file.
ID: 74334 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Speedy51

Send message
Joined: 12 Jun 10
Posts: 27
Credit: 4,400,473
RAC: 3,376
3 million credit badge12 year member badge
Message 74337 - Posted: 2 Oct 2022, 21:58:43 UTC - in response to Message 74334.  

Thanks Mikey, apologies to everyone in making changes to my app info file I trashed 20 tasks. No more CPU tasks for me. I will also wait until the tasks waiting validation comes down before contributing again with my GPU
ID: 74337 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 13 Oct 21
Posts: 43
Credit: 60,932,436
RAC: 374,467
50 million credit badge1 year member badge
Message 74338 - Posted: 3 Oct 2022, 3:34:19 UTC - in response to Message 74337.  
Last modified: 3 Oct 2022, 3:38:35 UTC

Making changes to the app_config usually doesn't make tasks error out as most of the entries affect BOINC only (not the project app) and incorrect formatting or syntax is usually ignored. However, there's one, optional, entry that affects the project app, cmdline (where --nthreads is usually the argument) and it can cause the tasks to crash if the format or syntax is incorrect. That's what made your tasks crash. Here's from the error log of one of the tasks:
Argument parsing error: --nthreads>2: unknown option
Failed to read arguments

The syntax is --nthreads x, where x is the number of threads you want to use. Having said that, the optional entry cmdline --nthreads ... is unnecessary for MilkyWay N-body as avg_ncpus does the job for both BOINC and the project app.

Here's an app_config that has N-body use 4 threads per task and runs 2 separation GPU tasks simultaneously.
<app_config>
   <app>
      <name>milkyway</name>
      <gpu_versions>
          <gpu_usage>.5</gpu_usage>
          <cpu_usage>.9</cpu_usage>
      </gpu_versions>
   </app>
   <app_version>
      <app_name>milkyway_nbody</app_name>
      <plan_class>mt</plan_class>
      <avg_ncpus>4</avg_ncpus>
   </app_version>
</app_config>

One of the reasons that only one N-body is running could be due to how you allocated resources to BOINC itself and to various projects. BOINC uses that info to determine how many tasks of which project to run and when. max_concurrent is only a max limiter and won't force BOINC to run a certain amount of tasks.

I'd suggest not to worry about runtimes. The credit per unit of runtime is pretty much the same regardless of how long a task takes. It's perfectly fine to run N-body 1 or 2 core. Too many cores will actually make things less productive. I, for example, found that 4 cores per task gives the best tasks/hour rate and anything higher than about 9 cores is no better and even worse than 2 cores.

May i suggest you don't give up on CPU tasks or GPU ones for that matter. Just take the time to figure out the resource allocation and making a valid app_config. The high validation will come down in due time. The current issue is nothing close to the aftermath of the disk crash a few months ago and in the end everything straightened out, no tasks were lost and everyone got their credit. This will be no different. Users leaving is very likely worse for the project as that's less PCs to clear out the validation and the high N-Body queue, which just means that everything will take longer.
ID: 74338 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
San-Fernando-Valley

Send message
Joined: 13 Apr 17
Posts: 215
Credit: 131,318,912
RAC: 14,971
100 million credit badge5 year member badgeextraordinary contributions badge
Message 74340 - Posted: 3 Oct 2022, 5:06:19 UTC - in response to Message 74338.  

+1
ID: 74340 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Speedy51

Send message
Joined: 12 Jun 10
Posts: 27
Credit: 4,400,473
RAC: 3,376
3 million credit badge12 year member badge
Message 74342 - Posted: 3 Oct 2022, 6:42:10 UTC - in response to Message 74338.  

Thanks for your feedback you are yeah & your app info data. I was thinking with all the work pending that I was creating I was adding extra pressure to the database this is clearly not the case.
Once my current GPU task is complete I will more than likely come back here with it. I have other plans for my CPUs currently, I will certainly consisted of bringing them back over here in a month or so.
ID: 74342 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 13 Oct 21
Posts: 43
Credit: 60,932,436
RAC: 374,467
50 million credit badge1 year member badge
Message 74347 - Posted: 4 Oct 2022, 14:46:53 UTC - in response to Message 74342.  

Yeah, in general it's usually best to just keep crunching and let admins decide if something needs to be done server side, like turn off task generator for example. Validator and unsent tasks queues can feed each other and make it look like no progress is being made when it is, it's just not yet visible.

I contribute to various projects but sometimes focus on one at a time also. I assume that's what you're doing too. I'd just hope that people wouldn't stop contributing because project is experiencing some difficulties.
ID: 74347 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
San-Fernando-Valley

Send message
Joined: 13 Apr 17
Posts: 215
Credit: 131,318,912
RAC: 14,971
100 million credit badge5 year member badgeextraordinary contributions badge
Message 74349 - Posted: 4 Oct 2022, 14:52:20 UTC - in response to Message 74347.  

+1
ID: 74349 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : News : Server Maintenance 12:00 PM ET (16:00 UTC) 9/23/2022

©2022 Astroinformatics Group