Message boards :
Number crunching :
Long crunch time on new N-Body simulations?
Message board moderation
Author | Message |
---|---|
Send message Joined: 8 Jan 18 Posts: 44 Credit: 43,798,833 RAC: 2,383 |
I've noticed I have a few N-Body tasks that are taking way longer than I had experienced in the past. What would normally take a few hours is creeping into taking 24 hours or more. Not all of my N-Body tasks are this way, but a few of them are here and here. I'm letting them crunch through. They don't seem to be "stuck", just taking awhile. They should be able to complete before the deadline, assuming no delays. I just wanted to see if anyone else is experiencing this. |
Send message Joined: 9 Jul 17 Posts: 100 Credit: 16,967,906 RAC: 0 |
Yes, I saw it. I think it is normal. |
Send message Joined: 19 Jul 10 Posts: 627 Credit: 19,303,095 RAC: 1,042 |
Well, this isn't very surprising since the new application is using only one single core. |
Send message Joined: 8 Jan 18 Posts: 44 Credit: 43,798,833 RAC: 2,383 |
Well, this isn't very surprising since the new application is using only one single core.Ok, true, but I think the estimated completion time is being under-estimated. I had some tasks that took 24-26 hours to complete, but I think they were originally estimated at 11 hours or so. I didn't pay close enough attention to know for sure if this is the case. I'll have to check upcoming tasks to see if this is really the case or not. |
Send message Joined: 12 Dec 15 Posts: 53 Credit: 133,288,534 RAC: 0 |
Well, this isn't very surprising since the new application is using only one single core.Ok, true, but I think the estimated completion time is being under-estimated. I had some tasks that took 24-26 hours to complete, but I think they were originally estimated at 11 hours or so. I didn't pay close enough attention to know for sure if this is the case. I'll have to check upcoming tasks to see if this is really the case or not. BOINC client keeps a running average of completion times to estimate completion and the old runtimes outweigh the new runtimes in the average. I think if you set, in cc_config.xml, <rec_half_life_days>0</rec_half_life_days> then restart BOINC and run it for an hour then set it back to default 10 days <rec_half_life_days>10</rec_half_life_days>, you'll reset the running averages (of all WU's) and it should be close to the right number in 24 hours. I don't usually worry about the estimate (it's usually always wrong) and so haven't tested this. https://boinc.berkeley.edu/wiki/Client_configuration |
Send message Joined: 24 Jan 11 Posts: 715 Credit: 556,089,655 RAC: 51,411 |
BOINC client keeps a running average of completion times to estimate completion and the old runtimes outweigh the new runtimes in the average. This is the correct way to update the estimated times. Or if you want an estimate that changes more rapidly based on a quickly changing data mix, set <rec_half_life_days>10</rec_half_life_days> to <rec_half_life_days>1</rec_half_life_days> and your estimates will only average over the last day. That is what I run on my clients since Seti has a fairly diverse data mix that changes daily. |
Send message Joined: 8 Jan 18 Posts: 44 Credit: 43,798,833 RAC: 2,383 |
Thanks for the input Marmot and Keith. The reason I noticed this was because I am running MW, Einstein, and Seti all on the same machine. Einstein also has some tasks that are taking about 10 hours. I had several tasks from MW and Einstein that needed a lot of compute time and their due dates were approaching. I suspect now that I just had the store a minimum/additional settings a little high. Compounding that with two projects with newer tasks that take longer computing times from before probably put me in that concern. I had the settings at 1 and 5, but I have dialed it back to 0.5 and 2. We’ll see how it goes from there. |
Send message Joined: 28 Dec 18 Posts: 14 Credit: 1,419,832 RAC: 0 |
[Deleted] |
Send message Joined: 8 Jan 18 Posts: 44 Credit: 43,798,833 RAC: 2,383 |
So, I think something is going on here. I currently have four N-Body 1.76 tasks running that so far have elapsed 2 or 3 DAYS, with most of those having 10+ hours to remain. These tasks were downloaded over a week ago (6 June 2019, 9 June 2019), and I am pretty sure I would have noticed estimated times of that long. Marmot, I did try adjusting the half life as you suggested and it had not picked up this discrepancy in ETA. This morning I have adjusted my half life setting down to 1. I feel that this is a problem that shouldn't be happening. The CPU, despite being a laptop, is from 2017 so it isn't a slow processor. It has a higher GFLOPS/core than an i7-8700. Is there any kind of debugging I can do to evaluate this? |
Send message Joined: 18 Jul 09 Posts: 300 Credit: 303,566,315 RAC: 77 |
So, I think something is going on here. I currently have four N-Body 1.76 tasks running that so far have elapsed 2 or 3 DAYS, with most of those having 10+ hours to remain. These tasks were downloaded over a week ago (6 June 2019, 9 June 2019), and I am pretty sure I would have noticed estimated times of that long. Marmot, I did try adjusting the half life as you suggested and it had not picked up this discrepancy in ETA. This morning I have adjusted my half life setting down to 1. My aging dual core AMD Phenom II is processing N-Body 1.76 tasks in just a few hours, all have been pretty close to the estimated time. |
Send message Joined: 24 Jan 11 Posts: 715 Credit: 556,089,655 RAC: 51,411 |
When the run_time greatly exceeds the cpu_time, that indicates a cpu that is overcommitted. Try running fewer tasks. Or reduce the number of background processes that are stealing cpu cycles from the crunching. |
Send message Joined: 8 Jan 18 Posts: 44 Credit: 43,798,833 RAC: 2,383 |
Sorry for the late response, but I think I know the culprit. I was crunching Intel GPU tasks for Seti@home and that slowed everything down. I don't know why I didn't figure that in the first place. Times have sped up significantly since I stopped with the iGPU tasks. |
Send message Joined: 8 Jan 18 Posts: 44 Credit: 43,798,833 RAC: 2,383 |
I'm back. Same problem, different computer (located here). All my MW N-body 1.76 task that haven't started have an ETA of about 50 minutes (minimum) or 3 hours (maximum). Today, I have two that are running, both about 10 hours completed, but now they show completion times between 15 and 19 hours. When I checked on this computer last night I had NNT set for MW, and none of the Nbody tasks had an ETA of that long. I also had a task that had an ETA of about 2 hours when I started it earlier today, and now the ETA is about 20 hours remaining after running 6.5 hours. It appears that the initial ETA for these tasks are under-estimated by a factor of eight! I'm concerned that two things are happening here. First, that the amount of N-body tasks I downloaded are too many for my CPU to crunch in the two week timeframe (even ignoring other projects). Secondly, I'm afraid the other deadlines will pass for the other projects (i.e., MW) because my computer would have spent too much time on the N-body tasks. Basically, my suspicion is that with under-estimated ETAs, Boinc will download more tasks than my computer is capable of crunching in the alloted time. I currently have my computer set to suspended any Seti & Einstein tasks, and I have suspended all non N-body tasks for MW. I also have Einstein & MW set to NNT (Seti is just plain old suspended...their deadlines are very generous). So, all four cores are crunching MW all the time, and no GPUs are running to complicate/burden the CPU tasks. I'm hoping by exposing under estimated MW tasks early I can get them done in time for ALL tasks to be completed before their deadline. Is there a way to get the initial ETA estimation set higher for these tasks? Without knowing the code, I'm guessing this is something that should be sorted out on the server end, not the user end. |
Send message Joined: 8 Jan 18 Posts: 44 Credit: 43,798,833 RAC: 2,383 |
I don't think I'm the only one that has had this problem. Check out https://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=1796656112. Two tasks timed out while computing, one didn't start before the deadline, and only one task completed. |
Send message Joined: 27 Jul 14 Posts: 23 Credit: 921,261,826 RAC: 0 |
Is there a way to get the initial ETA estimation set higher for these tasks? Without knowing the code, I'm guessing this is something that should be sorted out on the server end, not the user end. The best way to handle this is lower your cache (aka Store at least N days of work in your preferences), at least until the estimates are more accurate. I don't usually set mine to more than half a day, and I set it that low precisely because of seeing this problem. Team USA forum | Team USA page Always crunching / Always recruiting |
Send message Joined: 8 Jan 18 Posts: 44 Credit: 43,798,833 RAC: 2,383 |
That's fair, but when my settings are a minimum of 2 days, 5 days additional, how is it I can't complete tasks within a 2 week deadline? The computer runs 24/7 crunching, 100% of the CPU time and all cores. I am not restricting boinc from attempting to complete tasks at all. |
Send message Joined: 27 Jul 14 Posts: 23 Credit: 921,261,826 RAC: 0 |
My guess is that the combination of work now taking a lot longer than before, plus multi-threaded work running again has really confused things. The client is also bad at accounting for anything in app_config that might affect run times, specifically max_concurrent, so if you are using those that might be part of the problem. Regardless, a low cache is the only way to keep from getting too much work. Team USA forum | Team USA page Always crunching / Always recruiting |
Send message Joined: 18 Jul 09 Posts: 300 Credit: 303,566,315 RAC: 77 |
That's fair, but when my settings are a minimum of 2 days, 5 days additional, how is it I can't complete tasks within a 2 week deadline? The computer runs 24/7 crunching, 100% of the CPU time and all cores. I am not restricting boinc from attempting to complete tasks at all. Very few people can crunch both n-body and separation WUs on the same computer. The change from single thread to multi thread n-body work is only going to make it even more difficult to deal with the scheduler. |
Send message Joined: 8 Jan 18 Posts: 44 Credit: 43,798,833 RAC: 2,383 |
Well for now I'm going to just stick with the separation task and not crunch n-body. I only received a few 4 CPU n-body tasks, the rest were 1 core tasks. This seems more like a bug than a feature to me. IMHO, this would be a deterrent to people that just want to "set and forget" Boinc. However, I think I've stood on this soap box for long enough, so I'll drop this for now unless someone else wants to address this. |
Send message Joined: 29 Aug 07 Posts: 21 Credit: 1,050,702 RAC: 0 |
Is there a difference in runtimes between the work units? A recent 4-core unit took longer than a 3-core unit! Regards, Bob P. |
©2024 Astroinformatics Group