Server Maintenance 12:00 PM ET (16:00 UTC) 9/23/2022

Author	Message
Tom Donlon Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0	Message 74303 - Posted: 28 Sep 2022, 16:05:34 UTC The server slowness recently was due to us copying data over from this server to the new machine. That's all done now and we should notice an increase in speed again. I'm not exactly sure why the validation waiting count keeps climbing, but yeah it might have something to do with the sleep call in the validator code. Taking a look at why that is there is on our to-do list when we recompile binaries for the new server. I think there are just fewer people who work on Nbody because it doesn't have a GPU application (so it's much less efficient in terms of credit), so it takes longer to crunch through the backlog. It's really not a problem as far as I can tell. ID: 74303 · Rating: 0 · rate: / Reply Quote

Speedy51 Send message Joined: 12 Jun 10 Posts: 57 Credit: 6,527,559 RAC: 397	Message 74308 - Posted: 29 Sep 2022, 1:59:23 UTC - in response to Message 74301. Last modified: 29 Sep 2022, 2:06:02 UTC Thanks Mikey you for your input. I completely agree with what you are saying. I have processed all of my _3 & _2 now I am just working through my remaining 84 _1 tasks. Sometimes you get lucky with_1 tasks in the fact that they validate as soon as you return them :-) here is an example It is really handy when you get a task that runs for seconds it helps empty your list of tasks quicker. N body tasks waiting to be sent are slowly dropping this is nice to see as I write there is 115991 queued waiting to be sent Tom thanks for all of the hard work and keeping us updated with things that are happening ID: 74308 · Rating: 0 · rate: / Reply Quote

Jim1348 Send message Joined: 9 Jul 17 Posts: 100 Credit: 16,967,906 RAC: 0	Message 74309 - Posted: 29 Sep 2022, 9:24:50 UTC - in response to Message 74303. Last modified: 29 Sep 2022, 9:27:11 UTC I think there are just fewer people who work on Nbody because it doesn't have a GPU application (so it's much less efficient in terms of credit), so it takes longer to crunch through the backlog. It's really not a problem as far as I can tell. I think the real reason is that if you want to run Separation only on the GPU, you have to turn off "Use CPU". Then, you of course can't run the N-Body. You need to allow running the Separation GPU work units only, without the CPU Separation work units, while still allowing N-Body to run on the CPU. (Don't someone tell me to use two BOINC instances. I have been doing that for ages. Not many people will.) ID: 74309 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 8 May 09 Posts: 3339 Credit: 524,398,788 RAC: 79	Message 74310 - Posted: 29 Sep 2022, 9:59:39 UTC - in response to Message 74309. I think there are just fewer people who work on Nbody because it doesn't have a GPU application (so it's much less efficient in terms of credit), so it takes longer to crunch through the backlog. It's really not a problem as far as I can tell. I think the real reason is that if you want to run Separation only on the GPU, you have to turn off "Use CPU". Then, you of course can't run the N-Body. You need to allow running the Separation GPU work units only, without the CPU Separation work units, while still allowing N-Body to run on the CPU. YES PLEASE!! (Don't someone tell me to use two BOINC instances. I have been doing that for ages. Not many people will.) What I do is run this pc using gpu tasks and that pc running cpu tasks and do it the other way at other projects. Right now I'm mostly running cpu Separation tasks but that's because I need hours for another wuprop badge, I have a couple of gpu's that won't work elsewhere so they too are doing Separation tasks. ID: 74310 · Rating: 0 · rate: / Reply Quote

Septimus Send message Joined: 8 Nov 11 Posts: 205 Credit: 2,905,914 RAC: 0	Message 74313 - Posted: 29 Sep 2022, 12:34:32 UTC - in response to Message 74310. Last modified: 29 Sep 2022, 12:35:02 UTC The latest Nbody Simulation tasks are taking a lot of resources, over two hours across 8 CPUâ€™s 16 hours plus cpu time. They were previously taking 4-6 minutes when I did them before. Can see why people are reluctant to run them. ID: 74313 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 8 May 09 Posts: 3339 Credit: 524,398,788 RAC: 79	Message 74314 - Posted: 29 Sep 2022, 18:11:57 UTC - in response to Message 74313. The latest Nbody Simulation tasks are taking a lot of resources, over two hours across 8 CPUâ€™s 16 hours plus cpu time. They were previously taking 4-6 minutes when I did them before. Can see why people are reluctant to run them. Are you using all of your cpu cores, ie 8, to run the tasks? If so try running then using only 7 cpu cores or even 6 and see if that helps, sometimes OS's get overwhelmed when all the cpu cores are running the exact same task here at MilkyWay. ID: 74314 · Rating: 0 · rate: / Reply Quote

Septimus Send message Joined: 8 Nov 11 Posts: 205 Credit: 2,905,914 RAC: 0	Message 74315 - Posted: 29 Sep 2022, 19:45:34 UTC - in response to Message 74314. Thanks Mikeyâ€¦..I was only using 8 out of 16 cores system sometimes r showed 60% usage ..seems to be a batch of long simulations. Maybe try a reduction down to 6 or 4.. ID: 74315 · Rating: 0 · rate: / Reply Quote

Speedy51 Send message Joined: 12 Jun 10 Posts: 57 Credit: 6,527,559 RAC: 397	Message 74318 - Posted: 29 Sep 2022, 20:51:32 UTC - in response to Message 74314. I wish I could use my other 7 cores leaving one free for the GPU, seems to me will only use a maximum of 16 even with the following in app_config.xml in the project folder <app_config> <app> <name>milkyway</name> <fraction_done_exact/> <CPU_version> <cpu_usage>23</cpu_usage> </CPU_version> </app> </app_config> ID: 74318 · Rating: 0 · rate: / Reply Quote

Skillz Send message Joined: 28 May 17 Posts: 76 Credit: 4,425,653,574 RAC: 657,058	Message 74327 - Posted: 30 Sep 2022, 20:41:42 UTC - in response to Message 74318. I wish I could use my other 7 cores leaving one free for the GPU, seems to me will only use a maximum of 16 even with the following in app_config.xml in the project folder <app_config> <app> <name>milkyway</name> <fraction_done_exact/> <CPU_version> <cpu_usage>23</cpu_usage> </CPU_version> </app> </app_config> The easiest way to get this done would be running two BOINC instances on the same host. Set one for CPU only work, then set the CPU settings in the BOINCManager to only use 23 cores (It'll be a % of your total cores) and the other one set to run GPU only. ID: 74327 · Rating: 0 · rate: / Reply Quote

Speedy51 Send message Joined: 12 Jun 10 Posts: 57 Credit: 6,527,559 RAC: 397	Message 74328 - Posted: 30 Sep 2022, 21:15:31 UTC - in response to Message 74327. The easiest way to get this done would be running two BOINC instances on the same host. Set one for CPU only work, then set the CPU settings in the BOINCManager to only use 23 cores (It'll be a % of your total cores) and the other one set to run GPU only. Currently MilkyWay is the only project running & "% of total cores" is set "100%" and it is still only saying it is using "16 CPUs" under "task manager" "CPU %" is between "55 and 56" so I am not sure this will work. Are you running 2 boinc instances on the same drive? ID: 74328 · Rating: 0 · rate: / Reply Quote

jdzukley Send message Joined: 26 May 11 Posts: 32 Credit: 47,728,352 RAC: 6	Message 74329 - Posted: 1 Oct 2022, 0:47:23 UTC - in response to Message 74314. Last modified: 1 Oct 2022, 1:01:55 UTC In response to long running tasks.... FYI, over the years, it seams that when a new task group starts, i.e. the ones with the lowest numbers (like below 1000000) in the task name just before the last _ (underscore) will have long run times. The higher the number the shorter the run time. So as we crunch through the low numbers, the run times typically and historically become shorter until the next group hits, and the next group of 3 sequence starts. And but however, this often takes many weeks. As a speculative guess, and I suppose it would have to be the milky-way staff that would have to reply, I would think the low numbers start off as a point - or pixel close to the core of the milky-way, and then as the number increment, the referenced position advance outward. The further from the center, there would be lesser effects on that position, meaning faster crunch time? I would not hold my breath for any staff response, as I rarely see any staff from any site reply with additional information. In the mean time... we can speculate. ID: 74329 · Rating: 0 · rate: / Reply Quote

AndreyOR Send message Joined: 13 Oct 21 Posts: 44 Credit: 234,435,348 RAC: 4,458	Message 74332 - Posted: 1 Oct 2022, 23:11:51 UTC - in response to Message 74318. Last modified: 1 Oct 2022, 23:14:16 UTC N-Body can only use a maximum of 16 cores per task so you'll have to reduce the number of cores per task and run multiple tasks simultaneously. Unless modified, it'll by default use all available cores up to 16. I know some people do it but I've never worried about leaving a core free for GPU. It doesn't seem to make a difference although I've never done a more detailed test. Additionally, your app_config looks incorrect and is likely ignored. Check here for the correct format and syntax of the file: https://boinc.berkeley.edu/wiki/Client_configuration#Project-level_configuration. If you're just trying to modify N-Body, "milkyway" is not the right name and you'd need to use the app_version section of app_config. ID: 74332 · Rating: 0 · rate: / Reply Quote

Speedy51 Send message Joined: 12 Jun 10 Posts: 57 Credit: 6,527,559 RAC: 397	Message 74333 - Posted: 2 Oct 2022, 6:29:24 UTC - in response to Message 74332. Thank you for the information. Using only 8 cores seems to have decreased the runtime, however I am still only running 1 task at a time not sure what to do to increase it to 2 ID: 74333 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 8 May 09 Posts: 3339 Credit: 524,398,788 RAC: 79	Message 74334 - Posted: 2 Oct 2022, 11:12:32 UTC - in response to Message 74333. Thank you for the information. Using only 8 cores seems to have decreased the runtime, however I am still only running 1 task at a time not sure what to do to increase it to 2 You add something like this to you app_config.xml file inside the MilkyWay Project folder: <app_version> <app_name>milkyway_nbody</app_name> <max_concurrent>1</max_concurrent> <plan_class>mt</plan_class> <avg_ncpus>2</avg_ncpus> <cmdline>--nthreads 2</cmdline> </app_version> By changing the <max_concurrent> number to 2 and then changing the <avg_ncpus> to 8 it should run 2 tasks at a time using 8 cpu cores for each task, be sure to use Notepad in Windows or a text editor in Linux as Word Processing programs add hidden stuff that will cause Boinc to ignore the whole file. ID: 74334 · Rating: 0 · rate: / Reply Quote

Speedy51 Send message Joined: 12 Jun 10 Posts: 57 Credit: 6,527,559 RAC: 397	Message 74337 - Posted: 2 Oct 2022, 21:58:43 UTC - in response to Message 74334. Thanks Mikey, apologies to everyone in making changes to my app info file I trashed 20 tasks. No more CPU tasks for me. I will also wait until the tasks waiting validation comes down before contributing again with my GPU ID: 74337 · Rating: 0 · rate: / Reply Quote

AndreyOR Send message Joined: 13 Oct 21 Posts: 44 Credit: 234,435,348 RAC: 4,458	Message 74338 - Posted: 3 Oct 2022, 3:34:19 UTC - in response to Message 74337. Last modified: 3 Oct 2022, 3:38:35 UTC Making changes to the app_config usually doesn't make tasks error out as most of the entries affect BOINC only (not the project app) and incorrect formatting or syntax is usually ignored. However, there's one, optional, entry that affects the project app, cmdline (where --nthreads is usually the argument) and it can cause the tasks to crash if the format or syntax is incorrect. That's what made your tasks crash. Here's from the error log of one of the tasks: Argument parsing error: --nthreads>2: unknown option Failed to read arguments The syntax is --nthreads x, where x is the number of threads you want to use. Having said that, the optional entry cmdline --nthreads ... is unnecessary for MilkyWay N-body as avg_ncpus does the job for both BOINC and the project app. Here's an app_config that has N-body use 4 threads per task and runs 2 separation GPU tasks simultaneously. <app_config> <app> <name>milkyway</name> <gpu_versions> <gpu_usage>.5</gpu_usage> <cpu_usage>.9</cpu_usage> </gpu_versions> </app> <app_version> <app_name>milkyway_nbody</app_name> <plan_class>mt</plan_class> <avg_ncpus>4</avg_ncpus> </app_version> </app_config> One of the reasons that only one N-body is running could be due to how you allocated resources to BOINC itself and to various projects. BOINC uses that info to determine how many tasks of which project to run and when. max_concurrent is only a max limiter and won't force BOINC to run a certain amount of tasks. I'd suggest not to worry about runtimes. The credit per unit of runtime is pretty much the same regardless of how long a task takes. It's perfectly fine to run N-body 1 or 2 core. Too many cores will actually make things less productive. I, for example, found that 4 cores per task gives the best tasks/hour rate and anything higher than about 9 cores is no better and even worse than 2 cores. May i suggest you don't give up on CPU tasks or GPU ones for that matter. Just take the time to figure out the resource allocation and making a valid app_config. The high validation will come down in due time. The current issue is nothing close to the aftermath of the disk crash a few months ago and in the end everything straightened out, no tasks were lost and everyone got their credit. This will be no different. Users leaving is very likely worse for the project as that's less PCs to clear out the validation and the high N-Body queue, which just means that everything will take longer. ID: 74338 · Rating: 0 · rate: / Reply Quote

San-Fernando-Valley Send message Joined: 13 Apr 17 Posts: 256 Credit: 604,411,638 RAC: 0	Message 74340 - Posted: 3 Oct 2022, 5:06:19 UTC - in response to Message 74338. +1 ID: 74340 · Rating: 0 · rate: / Reply Quote

Speedy51 Send message Joined: 12 Jun 10 Posts: 57 Credit: 6,527,559 RAC: 397	Message 74342 - Posted: 3 Oct 2022, 6:42:10 UTC - in response to Message 74338. Thanks for your feedback you are yeah & your app info data. I was thinking with all the work pending that I was creating I was adding extra pressure to the database this is clearly not the case. Once my current GPU task is complete I will more than likely come back here with it. I have other plans for my CPUs currently, I will certainly consisted of bringing them back over here in a month or so. ID: 74342 · Rating: 0 · rate: / Reply Quote

AndreyOR Send message Joined: 13 Oct 21 Posts: 44 Credit: 234,435,348 RAC: 4,458	Message 74347 - Posted: 4 Oct 2022, 14:46:53 UTC - in response to Message 74342. Yeah, in general it's usually best to just keep crunching and let admins decide if something needs to be done server side, like turn off task generator for example. Validator and unsent tasks queues can feed each other and make it look like no progress is being made when it is, it's just not yet visible. I contribute to various projects but sometimes focus on one at a time also. I assume that's what you're doing too. I'd just hope that people wouldn't stop contributing because project is experiencing some difficulties. ID: 74347 · Rating: 0 · rate: / Reply Quote

San-Fernando-Valley Send message Joined: 13 Apr 17 Posts: 256 Credit: 604,411,638 RAC: 0	Message 74349 - Posted: 4 Oct 2022, 14:52:20 UTC - in response to Message 74347. +1 ID: 74349 · Rating: 0 · rate: / Reply Quote