Message boards :
Number crunching :
N-Body long processing time
Message board moderation
Author | Message |
---|---|
Send message Joined: 20 Feb 14 Posts: 6 Credit: 20,780,697 RAC: 0 |
Hi, I just saw that the N-body will take a long time to process. Is this correct? Even on a good intel i7 (3770 3.4 ghz)?? |
Send message Joined: 18 Jul 09 Posts: 300 Credit: 303,673,545 RAC: 3,658 |
While it may be a longer work unit than other n-bodies in your cache, those time estimates are notoriously unreliable. Keep an eye on it, if it's taking too long, abort it. But it probably will be fine. |
Send message Joined: 2 Jul 14 Posts: 15 Credit: 20,991,384 RAC: 0 |
It looks like it's just a rediculous estimated time. I have an i7 4770 running at 3.5 ghz, and I've only ever gotten 1 task that took that long. It was taking too long, so after about 12 hours I aborted it manually. |
Send message Joined: 3 May 10 Posts: 74 Credit: 1,532,760 RAC: 0 |
Hi, what is wrong with these N body tasks. I have, a record, for me, a task of 113 hours processing time. I also have a few that have been calculated that produce no results i.e. 6 Validation inconclusive and 2 Cant validate. The estinated calculation times are also rubbish sometimes when I reach it I have only done 10% of the task. I am running a dual core and have already contacted the guy who posted these tasks when around about 6th August all the tasks were erroring out who advised me to abort them. I think that I may stop calculating these until they have been removed from the task list as I dont like to abortthem. John p.s. I tried to post an image with this but I dont know how to do it. I have the .jpg but cannot post it here |
Send message Joined: 30 Nov 13 Posts: 7 Credit: 2,147,568 RAC: 0 |
I too am experiencing problems with N-body simulation v1.42 -- a job ran for almost 17 hours on 2 CPUs reached 100% and then promptly crashed with a computation error. 805862330 596473367 12 Aug 2014, 14:32:47 UTC 22 Aug 2014, 21:41:18 UTC Error while computing 60,819.13 103,158.20 --- MilkyWay@Home N-Body Simulation v1.42 (mt) It was running on GenuineIntel Pentium(R) Dual-Core CPU T4500 @ 2.30GHz [Family 6 Model 23 Stepping 10] (2 processors) We are not amused!! Alan Barnes |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
I too am experiencing problems with N-body simulation v1.42 -- a job ran for almost 17 hours on 2 CPUs reached 100% and then promptly crashed with a computation error. A suggestion might be to stop running the modified fit units and run something like this instead: Run only the selected applications MilkyWay@Home: yes MilkyWay@Home N-Body Simulation: no Milkyway@Home Separation: yes Milkyway@Home Separation (Modified Fit): no The modified fit units are multi-tasking, multi-cpu core units, meaning they use multiple cpu cores to run and even on the fastest pc's can be troublesome. |
Send message Joined: 21 Aug 13 Posts: 2 Credit: 117,645 RAC: 0 |
ps_nbody_08_05_orphan_sim_1_1405680903_433234_3 using milkyway_nbody version 142 was supposed to run for 4.5 hours (approx.) but ran for 33h. It was clear from the beginning that it would take that long, so the percentage of completion was OK but the scheduled length was 7-8 times to short. It ran OK and didn't fail. I see that I have other 1.42 simulations (and the actual one will take around 20-30h if percentages are correct instead of 35 minutes as planned). I have 2 questions: 1) Am I wasting my time on these? i.e. Is my computer for some reason much slower on these jobs than it should be? And therefore, should I avoid these jobs (for my own sake and that of the project)? 2) I am running other projects. Are they going to be disadvantaged? What I mean is: if I receive credits based on an estimated 4.5h job when the job actually ran 8 times longer, will the other projects be queued because boinc thinks milkyway@home hasn't received enough "time-share"? Or does boinc take into account the actual cpu run-time of different projects and therefore will compensate for long run-times of milkyway (and not allow it 8 times more time-share than other projects)? Thomas |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
ps_nbody_08_05_orphan_sim_1_1405680903_433234_3 using milkyway_nbody version 142 was supposed to run for 4.5 hours (approx.) but ran for 33h. It was clear from the beginning that it would take that long, so the percentage of completion was OK but the scheduled length was 7-8 times to short. You don't get credits based on how long they think the job will take, you get credits based on how much work your pc does, with older pc's doing less work in the same amount of time as newer pc's. 2: yes Boinc will adjust and you will get less work from the other projects. You CAN compensate somewhat but not totally. 1: IMHO yes if you can, not all work units run well on every pc, if these are taking too long and affecting your other projects then yes you might try running something else instead and see what happens. Your problem seems to be the total number of projects and their units you are trying to run at once, that combined with the 'n-body multi-core units' is causing issues for you. I would turn off the 'n-body' units and run a different type of cpu unit, it won't take multiple cores to run, and see if that is better for your setup. ps as long as the units get returned within their deadlines most projects don't really care how long it takes you to crunch a unit. They care when EVERYONE is taking longer then expected when they make a unit, but your single pc taking that much longer they will just write off as a one off exception. |
Send message Joined: 21 Aug 13 Posts: 2 Credit: 117,645 RAC: 0 |
I'll avoid n-body then. Thank you very much! thomas |
Send message Joined: 4 Jul 14 Posts: 8 Credit: 64,068,552 RAC: 0 |
I'm still a little confused. I've been running the N-bodies just fine for a month or so, and only recently did things start getting weird. I have been running a single N-body task now for nearly 48 hours, and it's only at 37%. The "Remaining (Estimated)" time block is actually still counting up, not down. Seems like a lot of things are going a little haywire here. Should I be killing this task, or let it run its course? |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
I'm still a little confused. I've been running the N-bodies just fine for a month or so, and only recently did things start getting weird. I have been running a single N-body task now for nearly 48 hours, and it's only at 37%. The "Remaining (Estimated)" time block is actually still counting up, not down. Seems like a lot of things are going a little haywire here. Should I be killing this task, or let it run its course? That sort of depends on you, if you feel okay letting it run then let it run, it most likely IS working, but if you feel like it just abort it and move on to the next one. Aborting a unit is not the end of the World, they just get resent to someone else to crunch. Now if you abort too many your total daily allowance of units could suffer until you start returning 'valid' units again. Aborting a unit means you did NOT return it as 'valid', normally a few makes no difference, but alot could. The n-body units use multiple cpu cores to crunch each unit, they make your pc kind of like a super computer using most or even all of your cpu cores on a single workunit. NOT for 100% of the time, but for parts of the time, that is part of why the numbers are fluctuating. Another part is that they are all guesstimates based on formulas with data that changes based on whatever your pc is doing. So if you are watching a movie or playing a game then Boinc can't use as much of the cpu cores for itself and the time goes up, leave the pc untouched and the times go back down again. That is why some people have multiple machines, so they can have one or two to 'work' on and then the others are 'Boinc only' machines, meaning all they do is crunch. The 'work' machines also crunch, just not quite as fast as the 'Boinc only' machines. It is part of how the big rac guys get that way. |
Send message Joined: 25 Aug 14 Posts: 2 Credit: 7,991 RAC: 0 |
Hi, also running ps_nbody_08_05_orphan_sim_2 and having troubles. It says 1hr "Remaining" but I calculate a week remaining! "Elapsed"=13:45 "Progress"=7.657% thats approx 180hrs less 14 makes 166hrs to go (1 week). Given deadline is 7/09/14 I'm inclined to abort as I wasn't looking to tie up my pc for a week. |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
Hi, also running ps_nbody_08_05_orphan_sim_2 and having troubles. It says 1hr "Remaining" but I calculate a week remaining! "Elapsed"=13:45 "Progress"=7.657% thats approx 180hrs less 14 makes 166hrs to go (1 week). Given deadline is 7/09/14 I'm inclined to abort as I wasn't looking to tie up my pc for a week. If you chose to abort it, the estimates are NOT RIGHT not even close, it could end in 10 minutes or run for an hour and be done, or be done 10 minutes prior to the deadline, or even run longer then the deadline. My suggestion would be to uncheck the n-body units on the webpage under your account, preferences for this project and then where it says: Run only the selected applications MilkyWay@Home MilkyWay@Home N-Body Simulation Milkyway@Home Separation Milkyway@Home Separation (Modified Fit) ONLY select the bottom one, the modified fit units and uncheck all the others. Each workunit will then run on a single cpu core and not use all the cpu cores on your system to run each workunit. The n-body units run very well for some people, but since we are not all clones and each use our pc's differently, we must sometimes tweak the systems to work with us, not against us. |
Send message Joined: 25 Aug 14 Posts: 2 Credit: 7,991 RAC: 0 |
Well I decided I'm aborting the ps_nbody. Its due 7/09 and still calculates to 177hrs processing even though it says it will be finished in 2 hrs. Simple fact is I'm running a 2.4GHz processor overclocked 10% and can't go any faster. So someone with more speed can have a go and maybe finish it earlier. |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
Well I decided I'm aborting the ps_nbody. Its due 7/09 and still calculates to 177hrs processing even though it says it will be finished in 2 hrs. Simple fact is I'm running a 2.4GHz processor overclocked 10% and can't go any faster. So someone with more speed can have a go and maybe finish it earlier. Yeah it happens, work units are not identical, otherwise why are we 'looking' at them, your unit will just get sent to someone else, and if that happens too often the project itself may even have to give it a go. And even then it could just turn out to be too full of static or junk and be uncrunchable, it happens sometimes, fortunately rarely but.... |
Send message Joined: 3 May 10 Posts: 74 Credit: 1,532,760 RAC: 0 |
Good call Mikey. I have followed your instructions and now I am getting tasks that I CAN calculate. No more aborted tasks I am happy and MW@H is happy. |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
Good call Mikey. I have followed your instructions and now I am getting tasks that I CAN calculate. No more aborted tasks I am happy and MW@H is happy. I am glad it is working for you, I guess I need to keep an eye in the mirror for you catching me now...note to self don't help so much!! Just kidding if you catch and pass me the Project benefits and that TOO is a good thing. |
Send message Joined: 8 Apr 11 Posts: 3 Credit: 46,155,607 RAC: 0 |
Your CPU has 4 full cores and the other 4 Hyper-threaded ones. I have only one machine with an Intel hyper-threaded CPU, and the performance of the hyper-thread can vary widely. For some stuff it performs just like a dual core, but for other stuff, the second thread seems to help much less. The best I've come up with, it seems that small repetitive junks of code, like benchmarks, that stay within the L1 cache will produce results almost the same as a full dual core. For programs that have long set of instructions being fetched from outside the cache, the secondary thread helps by getting the next instruction ready while the main core is executing. When you get two long sets of instructions being run on both a main core and its secondary thread, the performance for me was not close to that of two full cores. My experience with this was not with BOINC apps, but with some statistical programming I was doing a some time ago. Intel improved the hyper-threading with later releases, and compilers have also improved their optimization for hyper-threading, so results can vary. In general, Intel's CPUs are faster at floating point math than AMDs, but AMD's real Hex cores are sometimes faster than Intel's hyper-quad 8's. My experience with N-Body tasks is that they seldom take more than 3 hours on my machines, which have 4 hex core AMD Opteron CPUs. The OpenMP multi-threading used in the N-Body tasks is limited to only using 16 of my 24 cores, which is fine since I normally keep BOINC using only 20 of the 24 cores, so that the other 4 cores are available for IO, system overhead, and feeding data to GPU tasks running on the machine as well. With all 24 cores in use, the contention for resources was so high that overall CPU utilization was lower with 24 tasks than with 20 task, since many of the 24 tasks were in wait states much of the time. Similarly your hyper-threaded tasks my be interrupting tasks on the main core. For example I use the cc_config.xml file below. You could try the "ncpu" option and let it run tasks for a day at 8 and compare the run times with 4, or something in between. To compare the run times with different settings use the following formula: RunTime / threads(8) * ncpu(4) If your run times are something like 8 hours with ncpu = 8 and 4 hours with ncpu = 4 then you are better off with using all 8 threads. If your run time with ncpu = 4 is less than half the time as when using 8 threads, then you are better off with the lower ncpu settings. Multi-threaded performance estimation of far from straightforward. If your 8 threads were all full cores at 3.4ghz, your machine would likely run faster than my 24 cores at 2.4 ghz. For a single task, yours is almost 50% faster, but for 20 separate tasks running, mine would complete more tasks in a given time. For multi-threaded tasks like the n-body tasks, the results are less clear. For this reason sometimes the result validation is not as clear as with other BOINC tasks, and it takes some time before you get credit on n-body tasks. On the other hand, once you get your system configured, n-body multi-threaded tasks can eat lots of data quickly. It's the same principle that lets GPU tasks run so quickly. Some work is not easily ported to OpenCl on GPUs, which have limited instruction capability compared to the CPUs. These OpenMP tasks like n-body, allow the programers to use all of the complex instructions of the CPU, and line up multiple threads knocking over dominoes quickly like a GPU with more limited instruction sets. Good Luck, Greg <cc_config> <options> <allow_remote_gui_rpc>1</allow_remote_gui_rpc> <use_all_gpus>1</use_all_gpus> <fetch_on_update>1</fetch_on_update> <exclusive_app>synaptic</exclusive_app> <exclusive_gpu_app>vlc</exclusive_gpu_app> <start_delay>60</start_delay> <ncpus>20</ncpus> </options> </cc_config> |
Send message Joined: 2 Feb 09 Posts: 3 Credit: 1,443,819 RAC: 0 |
It seems that there are some workunits which do run very long time. Workunit 624805106 did run over in this host in approx 174h CPU time. Completed, validation inconclusive. Workunit 624880629 is still running in this host, runtime so far about 105h CPU time, 83% done. Another one, 840209104, so far 2.5% in 2h45m at this host What is wrong with these MT units? Or should I just abort them when I see it is gonna be long one? |
Send message Joined: 18 Jul 09 Posts: 300 Credit: 303,673,545 RAC: 3,658 |
My experience was that it was a bit of a gamble. I have seen them run for a very long time, and then validate, and run a long time just to error out. I got to the point where if they ran longer than a day I would just abort them, regardless of the estimated time to completion. |
©2025 Astroinformatics Group