Message boards :
News :
Server Maintenance 12:00 PM ET (16:00 UTC) 9/23/2022
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 |
The server slowness recently was due to us copying data over from this server to the new machine. That's all done now and we should notice an increase in speed again. I'm not exactly sure why the validation waiting count keeps climbing, but yeah it might have something to do with the sleep call in the validator code. Taking a look at why that is there is on our to-do list when we recompile binaries for the new server. I think there are just fewer people who work on Nbody because it doesn't have a GPU application (so it's much less efficient in terms of credit), so it takes longer to crunch through the backlog. It's really not a problem as far as I can tell. |
Send message Joined: 12 Jun 10 Posts: 57 Credit: 6,233,369 RAC: 1,334 |
Thanks Mikey you for your input. I completely agree with what you are saying. I have processed all of my _3 & _2 now I am just working through my remaining 84 _1 tasks. Sometimes you get lucky with_1 tasks in the fact that they validate as soon as you return them :-) here is an example It is really handy when you get a task that runs for seconds it helps empty your list of tasks quicker. N body tasks waiting to be sent are slowly dropping this is nice to see as I write there is 115991 queued waiting to be sent Tom thanks for all of the hard work and keeping us updated with things that are happening |
Send message Joined: 9 Jul 17 Posts: 100 Credit: 16,967,906 RAC: 0 |
I think there are just fewer people who work on Nbody because it doesn't have a GPU application (so it's much less efficient in terms of credit), so it takes longer to crunch through the backlog. It's really not a problem as far as I can tell. I think the real reason is that if you want to run Separation only on the GPU, you have to turn off "Use CPU". Then, you of course can't run the N-Body. You need to allow running the Separation GPU work units only, without the CPU Separation work units, while still allowing N-Body to run on the CPU. (Don't someone tell me to use two BOINC instances. I have been doing that for ages. Not many people will.) |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
I think there are just fewer people who work on Nbody because it doesn't have a GPU application (so it's much less efficient in terms of credit), so it takes longer to crunch through the backlog. It's really not a problem as far as I can tell. YES PLEASE!! (Don't someone tell me to use two BOINC instances. I have been doing that for ages. Not many people will.) What I do is run this pc using gpu tasks and that pc running cpu tasks and do it the other way at other projects. Right now I'm mostly running cpu Separation tasks but that's because I need hours for another wuprop badge, I have a couple of gpu's that won't work elsewhere so they too are doing Separation tasks. |
Send message Joined: 8 Nov 11 Posts: 205 Credit: 2,900,464 RAC: 0 |
The latest Nbody Simulation tasks are taking a lot of resources, over two hours across 8 CPU’s 16 hours plus cpu time. They were previously taking 4-6 minutes when I did them before. Can see why people are reluctant to run them. |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
The latest Nbody Simulation tasks are taking a lot of resources, over two hours across 8 CPU’s 16 hours plus cpu time. They were previously taking 4-6 minutes when I did them before. Can see why people are reluctant to run them. Are you using all of your cpu cores, ie 8, to run the tasks? If so try running then using only 7 cpu cores or even 6 and see if that helps, sometimes OS's get overwhelmed when all the cpu cores are running the exact same task here at MilkyWay. |
Send message Joined: 8 Nov 11 Posts: 205 Credit: 2,900,464 RAC: 0 |
Thanks Mikey…..I was only using 8 out of 16 cores system sometimes r showed 60% usage ..seems to be a batch of long simulations. Maybe try a reduction down to 6 or 4.. |
Send message Joined: 12 Jun 10 Posts: 57 Credit: 6,233,369 RAC: 1,334 |
I wish I could use my other 7 cores leaving one free for the GPU, seems to me will only use a maximum of 16 even with the following in app_config.xml in the project folder <app_config> <app> <name>milkyway</name> <fraction_done_exact/> <CPU_version> <cpu_usage>23</cpu_usage> </CPU_version> </app> </app_config> |
Send message Joined: 28 May 17 Posts: 76 Credit: 4,398,910,125 RAC: 24 |
I wish I could use my other 7 cores leaving one free for the GPU, seems to me will only use a maximum of 16 even with the following in app_config.xml in the project folder The easiest way to get this done would be running two BOINC instances on the same host. Set one for CPU only work, then set the CPU settings in the BOINCManager to only use 23 cores (It'll be a % of your total cores) and the other one set to run GPU only. |
Send message Joined: 12 Jun 10 Posts: 57 Credit: 6,233,369 RAC: 1,334 |
Currently MilkyWay is the only project running & "% of total cores" is set "100%" and it is still only saying it is using "16 CPUs" under "task manager" "CPU %" is between "55 and 56" so I am not sure this will work. Are you running 2 boinc instances on the same drive? |
Send message Joined: 26 May 11 Posts: 32 Credit: 43,959,896 RAC: 0 |
In response to long running tasks.... FYI, over the years, it seams that when a new task group starts, i.e. the ones with the lowest numbers (like below 1000000) in the task name just before the last _ (underscore) will have long run times. The higher the number the shorter the run time. So as we crunch through the low numbers, the run times typically and historically become shorter until the next group hits, and the next group of 3 sequence starts. And but however, this often takes many weeks. As a speculative guess, and I suppose it would have to be the milky-way staff that would have to reply, I would think the low numbers start off as a point - or pixel close to the core of the milky-way, and then as the number increment, the referenced position advance outward. The further from the center, there would be lesser effects on that position, meaning faster crunch time? I would not hold my breath for any staff response, as I rarely see any staff from any site reply with additional information. In the mean time... we can speculate. |
Send message Joined: 13 Oct 21 Posts: 44 Credit: 227,399,122 RAC: 19,824 |
N-Body can only use a maximum of 16 cores per task so you'll have to reduce the number of cores per task and run multiple tasks simultaneously. Unless modified, it'll by default use all available cores up to 16. I know some people do it but I've never worried about leaving a core free for GPU. It doesn't seem to make a difference although I've never done a more detailed test. Additionally, your app_config looks incorrect and is likely ignored. Check here for the correct format and syntax of the file: https://boinc.berkeley.edu/wiki/Client_configuration#Project-level_configuration. If you're just trying to modify N-Body, "milkyway" is not the right name and you'd need to use the app_version section of app_config. |
Send message Joined: 12 Jun 10 Posts: 57 Credit: 6,233,369 RAC: 1,334 |
Thank you for the information. Using only 8 cores seems to have decreased the runtime, however I am still only running 1 task at a time not sure what to do to increase it to 2 |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
Thank you for the information. Using only 8 cores seems to have decreased the runtime, however I am still only running 1 task at a time not sure what to do to increase it to 2 You add something like this to you app_config.xml file inside the MilkyWay Project folder: <app_version> <app_name>milkyway_nbody</app_name> <max_concurrent>1</max_concurrent> <plan_class>mt</plan_class> <avg_ncpus>2</avg_ncpus> <cmdline>--nthreads 2</cmdline> </app_version> By changing the <max_concurrent> number to 2 and then changing the <avg_ncpus> to 8 it should run 2 tasks at a time using 8 cpu cores for each task, be sure to use Notepad in Windows or a text editor in Linux as Word Processing programs add hidden stuff that will cause Boinc to ignore the whole file. |
Send message Joined: 12 Jun 10 Posts: 57 Credit: 6,233,369 RAC: 1,334 |
Thanks Mikey, apologies to everyone in making changes to my app info file I trashed 20 tasks. No more CPU tasks for me. I will also wait until the tasks waiting validation comes down before contributing again with my GPU |
Send message Joined: 13 Oct 21 Posts: 44 Credit: 227,399,122 RAC: 19,824 |
Making changes to the app_config usually doesn't make tasks error out as most of the entries affect BOINC only (not the project app) and incorrect formatting or syntax is usually ignored. However, there's one, optional, entry that affects the project app, cmdline (where --nthreads is usually the argument) and it can cause the tasks to crash if the format or syntax is incorrect. That's what made your tasks crash. Here's from the error log of one of the tasks: Argument parsing error: --nthreads>2: unknown option Failed to read arguments The syntax is --nthreads x, where x is the number of threads you want to use. Having said that, the optional entry cmdline --nthreads ... is unnecessary for MilkyWay N-body as avg_ncpus does the job for both BOINC and the project app. Here's an app_config that has N-body use 4 threads per task and runs 2 separation GPU tasks simultaneously. <app_config> <app> <name>milkyway</name> <gpu_versions> <gpu_usage>.5</gpu_usage> <cpu_usage>.9</cpu_usage> </gpu_versions> </app> <app_version> <app_name>milkyway_nbody</app_name> <plan_class>mt</plan_class> <avg_ncpus>4</avg_ncpus> </app_version> </app_config> One of the reasons that only one N-body is running could be due to how you allocated resources to BOINC itself and to various projects. BOINC uses that info to determine how many tasks of which project to run and when. max_concurrent is only a max limiter and won't force BOINC to run a certain amount of tasks. I'd suggest not to worry about runtimes. The credit per unit of runtime is pretty much the same regardless of how long a task takes. It's perfectly fine to run N-body 1 or 2 core. Too many cores will actually make things less productive. I, for example, found that 4 cores per task gives the best tasks/hour rate and anything higher than about 9 cores is no better and even worse than 2 cores. May i suggest you don't give up on CPU tasks or GPU ones for that matter. Just take the time to figure out the resource allocation and making a valid app_config. The high validation will come down in due time. The current issue is nothing close to the aftermath of the disk crash a few months ago and in the end everything straightened out, no tasks were lost and everyone got their credit. This will be no different. Users leaving is very likely worse for the project as that's less PCs to clear out the validation and the high N-Body queue, which just means that everything will take longer. |
Send message Joined: 13 Apr 17 Posts: 256 Credit: 604,411,638 RAC: 0 |
+1 |
Send message Joined: 12 Jun 10 Posts: 57 Credit: 6,233,369 RAC: 1,334 |
Thanks for your feedback you are yeah & your app info data. I was thinking with all the work pending that I was creating I was adding extra pressure to the database this is clearly not the case. Once my current GPU task is complete I will more than likely come back here with it. I have other plans for my CPUs currently, I will certainly consisted of bringing them back over here in a month or so. |
Send message Joined: 13 Oct 21 Posts: 44 Credit: 227,399,122 RAC: 19,824 |
Yeah, in general it's usually best to just keep crunching and let admins decide if something needs to be done server side, like turn off task generator for example. Validator and unsent tasks queues can feed each other and make it look like no progress is being made when it is, it's just not yet visible. I contribute to various projects but sometimes focus on one at a time also. I assume that's what you're doing too. I'd just hope that people wouldn't stop contributing because project is experiencing some difficulties. |
Send message Joined: 13 Apr 17 Posts: 256 Credit: 604,411,638 RAC: 0 |
+1 |
©2024 Astroinformatics Group