Welcome to MilkyWay@home

N-Body long processing time

Message boards : Number crunching : N-Body long processing time
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
guizalan

Send message
Joined: 20 Feb 14
Posts: 6
Credit: 20,780,697
RAC: 0
Message 62167 - Posted: 15 Aug 2014, 0:27:55 UTC

Hi,

I just saw that the N-body will take a long time to process. Is this correct? Even on a good intel i7 (3770 3.4 ghz)??

ID: 62167 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
swiftmallard
Avatar

Send message
Joined: 18 Jul 09
Posts: 300
Credit: 303,562,776
RAC: 0
Message 62168 - Posted: 15 Aug 2014, 1:01:00 UTC

While it may be a longer work unit than other n-bodies in your cache, those time estimates are notoriously unreliable. Keep an eye on it, if it's taking too long, abort it. But it probably will be fine.
ID: 62168 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile SuperSluether
Avatar

Send message
Joined: 2 Jul 14
Posts: 15
Credit: 20,991,384
RAC: 46
Message 62185 - Posted: 16 Aug 2014, 16:59:14 UTC - in response to Message 62167.  

It looks like it's just a rediculous estimated time. I have an i7 4770 running at 3.5 ghz, and I've only ever gotten 1 task that took that long. It was taking too long, so after about 12 hours I aborted it manually.
ID: 62185 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile John Black

Send message
Joined: 3 May 10
Posts: 74
Credit: 1,532,760
RAC: 0
Message 62205 - Posted: 22 Aug 2014, 14:54:19 UTC

Hi,

what is wrong with these N body tasks. I have, a record, for me, a task of 113 hours processing time. I also have a few that have been calculated that produce no results i.e. 6 Validation inconclusive and 2 Cant validate.
The estinated calculation times are also rubbish sometimes when I reach it I have only done 10% of the task.

I am running a dual core and have already contacted the guy who posted these tasks when around about 6th August all the tasks were erroring out who advised me to abort them.

I think that I may stop calculating these until they have been removed from the task list as I dont like to abortthem.

John

p.s. I tried to post an image with this but I dont know how to do it. I have the .jpg but cannot post it here

ID: 62205 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Alan Barnes

Send message
Joined: 30 Nov 13
Posts: 7
Credit: 2,147,568
RAC: 0
Message 62206 - Posted: 22 Aug 2014, 22:00:30 UTC

I too am experiencing problems with N-body simulation v1.42 -- a job ran for almost 17 hours on 2 CPUs reached 100% and then promptly crashed with a computation error.

805862330 596473367 12 Aug 2014, 14:32:47 UTC 22 Aug 2014, 21:41:18 UTC Error while computing 60,819.13 103,158.20 --- MilkyWay@Home N-Body Simulation v1.42 (mt)

It was running on
GenuineIntel
Pentium(R) Dual-Core CPU T4500 @ 2.30GHz [Family 6 Model 23 Stepping 10]
(2 processors)


We are not amused!!

Alan Barnes
ID: 62206 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3315
Credit: 519,939,926
RAC: 22,722
Message 62207 - Posted: 23 Aug 2014, 10:15:28 UTC - in response to Message 62206.  

I too am experiencing problems with N-body simulation v1.42 -- a job ran for almost 17 hours on 2 CPUs reached 100% and then promptly crashed with a computation error.

805862330 596473367 12 Aug 2014, 14:32:47 UTC 22 Aug 2014, 21:41:18 UTC Error while computing 60,819.13 103,158.20 --- MilkyWay@Home N-Body Simulation v1.42 (mt)

It was running on
GenuineIntel
Pentium(R) Dual-Core CPU T4500 @ 2.30GHz [Family 6 Model 23 Stepping 10]
(2 processors)


We are not amused!!

Alan Barnes


A suggestion might be to stop running the modified fit units and run something like this instead:
Run only the selected applications
MilkyWay@Home: yes
MilkyWay@Home N-Body Simulation: no
Milkyway@Home Separation: yes
Milkyway@Home Separation (Modified Fit): no

The modified fit units are multi-tasking, multi-cpu core units, meaning they use multiple cpu cores to run and even on the fastest pc's can be troublesome.
ID: 62207 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
tomd@perso.be

Send message
Joined: 21 Aug 13
Posts: 2
Credit: 117,645
RAC: 0
Message 62220 - Posted: 27 Aug 2014, 10:14:58 UTC

ps_nbody_08_05_orphan_sim_1_1405680903_433234_3 using milkyway_nbody version 142 was supposed to run for 4.5 hours (approx.) but ran for 33h. It was clear from the beginning that it would take that long, so the percentage of completion was OK but the scheduled length was 7-8 times to short.
It ran OK and didn't fail.

I see that I have other 1.42 simulations (and the actual one will take around 20-30h if percentages are correct instead of 35 minutes as planned).

I have 2 questions:

1) Am I wasting my time on these? i.e. Is my computer for some reason much slower on these jobs than it should be? And therefore, should I avoid these jobs (for my own sake and that of the project)?

2) I am running other projects. Are they going to be disadvantaged?
What I mean is: if I receive credits based on an estimated 4.5h job when the job actually ran 8 times longer, will the other projects be queued because boinc thinks milkyway@home hasn't received enough "time-share"? Or does boinc take into account the actual cpu run-time of different projects and therefore will compensate for long run-times of milkyway (and not allow it 8 times more time-share than other projects)?

Thomas
ID: 62220 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3315
Credit: 519,939,926
RAC: 22,722
Message 62221 - Posted: 27 Aug 2014, 10:35:23 UTC - in response to Message 62220.  
Last modified: 27 Aug 2014, 10:38:36 UTC

ps_nbody_08_05_orphan_sim_1_1405680903_433234_3 using milkyway_nbody version 142 was supposed to run for 4.5 hours (approx.) but ran for 33h. It was clear from the beginning that it would take that long, so the percentage of completion was OK but the scheduled length was 7-8 times to short.
It ran OK and didn't fail.

I see that I have other 1.42 simulations (and the actual one will take around 20-30h if percentages are correct instead of 35 minutes as planned).

I have 2 questions:

1) Am I wasting my time on these? i.e. Is my computer for some reason much slower on these jobs than it should be? And therefore, should I avoid these jobs (for my own sake and that of the project)?

2) I am running other projects. Are they going to be disadvantaged?
What I mean is: if I receive credits based on an estimated 4.5h job when the job actually ran 8 times longer, will the other projects be queued because boinc thinks milkyway@home hasn't received enough "time-share"? Or does boinc take into account the actual cpu run-time of different projects and therefore will compensate for long run-times of milkyway (and not allow it 8 times more time-share than other projects)?

Thomas


You don't get credits based on how long they think the job will take, you get credits based on how much work your pc does, with older pc's doing less work in the same amount of time as newer pc's.

2: yes Boinc will adjust and you will get less work from the other projects. You CAN compensate somewhat but not totally.

1: IMHO yes if you can, not all work units run well on every pc, if these are taking too long and affecting your other projects then yes you might try running something else instead and see what happens. Your problem seems to be the total number of projects and their units you are trying to run at once, that combined with the 'n-body multi-core units' is causing issues for you. I would turn off the 'n-body' units and run a different type of cpu unit, it won't take multiple cores to run, and see if that is better for your setup.

ps as long as the units get returned within their deadlines most projects don't really care how long it takes you to crunch a unit. They care when EVERYONE is taking longer then expected when they make a unit, but your single pc taking that much longer they will just write off as a one off exception.
ID: 62221 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
tomd@perso.be

Send message
Joined: 21 Aug 13
Posts: 2
Credit: 117,645
RAC: 0
Message 62222 - Posted: 27 Aug 2014, 10:44:55 UTC - in response to Message 62221.  

I'll avoid n-body then.

Thank you very much!

thomas
ID: 62222 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Presrvd

Send message
Joined: 4 Jul 14
Posts: 8
Credit: 64,068,552
RAC: 0
Message 62223 - Posted: 27 Aug 2014, 21:54:47 UTC

I'm still a little confused. I've been running the N-bodies just fine for a month or so, and only recently did things start getting weird. I have been running a single N-body task now for nearly 48 hours, and it's only at 37%. The "Remaining (Estimated)" time block is actually still counting up, not down. Seems like a lot of things are going a little haywire here. Should I be killing this task, or let it run its course?
ID: 62223 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3315
Credit: 519,939,926
RAC: 22,722
Message 62224 - Posted: 28 Aug 2014, 10:27:42 UTC - in response to Message 62223.  

I'm still a little confused. I've been running the N-bodies just fine for a month or so, and only recently did things start getting weird. I have been running a single N-body task now for nearly 48 hours, and it's only at 37%. The "Remaining (Estimated)" time block is actually still counting up, not down. Seems like a lot of things are going a little haywire here. Should I be killing this task, or let it run its course?


That sort of depends on you, if you feel okay letting it run then let it run, it most likely IS working, but if you feel like it just abort it and move on to the next one. Aborting a unit is not the end of the World, they just get resent to someone else to crunch. Now if you abort too many your total daily allowance of units could suffer until you start returning 'valid' units again. Aborting a unit means you did NOT return it as 'valid', normally a few makes no difference, but alot could.

The n-body units use multiple cpu cores to crunch each unit, they make your pc kind of like a super computer using most or even all of your cpu cores on a single workunit. NOT for 100% of the time, but for parts of the time, that is part of why the numbers are fluctuating. Another part is that they are all guesstimates based on formulas with data that changes based on whatever your pc is doing. So if you are watching a movie or playing a game then Boinc can't use as much of the cpu cores for itself and the time goes up, leave the pc untouched and the times go back down again. That is why some people have multiple machines, so they can have one or two to 'work' on and then the others are 'Boinc only' machines, meaning all they do is crunch. The 'work' machines also crunch, just not quite as fast as the 'Boinc only' machines. It is part of how the big rac guys get that way.
ID: 62224 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Strat

Send message
Joined: 25 Aug 14
Posts: 2
Credit: 7,991
RAC: 0
Message 62241 - Posted: 2 Sep 2014, 8:26:28 UTC

Hi, also running ps_nbody_08_05_orphan_sim_2 and having troubles. It says 1hr "Remaining" but I calculate a week remaining! "Elapsed"=13:45 "Progress"=7.657% thats approx 180hrs less 14 makes 166hrs to go (1 week). Given deadline is 7/09/14 I'm inclined to abort as I wasn't looking to tie up my pc for a week.
ID: 62241 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3315
Credit: 519,939,926
RAC: 22,722
Message 62242 - Posted: 2 Sep 2014, 10:31:43 UTC - in response to Message 62241.  

Hi, also running ps_nbody_08_05_orphan_sim_2 and having troubles. It says 1hr "Remaining" but I calculate a week remaining! "Elapsed"=13:45 "Progress"=7.657% thats approx 180hrs less 14 makes 166hrs to go (1 week). Given deadline is 7/09/14 I'm inclined to abort as I wasn't looking to tie up my pc for a week.


If you chose to abort it, the estimates are NOT RIGHT not even close, it could end in 10 minutes or run for an hour and be done, or be done 10 minutes prior to the deadline, or even run longer then the deadline. My suggestion would be to uncheck the n-body units on the webpage under your account, preferences for this project and then where it says:
Run only the selected applications
MilkyWay@Home
MilkyWay@Home N-Body Simulation
Milkyway@Home Separation
Milkyway@Home Separation (Modified Fit)

ONLY select the bottom one, the modified fit units and uncheck all the others. Each workunit will then run on a single cpu core and not use all the cpu cores on your system to run each workunit. The n-body units run very well for some people, but since we are not all clones and each use our pc's differently, we must sometimes tweak the systems to work with us, not against us.
ID: 62242 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Strat

Send message
Joined: 25 Aug 14
Posts: 2
Credit: 7,991
RAC: 0
Message 62245 - Posted: 2 Sep 2014, 23:38:59 UTC - in response to Message 62242.  

Well I decided I'm aborting the ps_nbody. Its due 7/09 and still calculates to 177hrs processing even though it says it will be finished in 2 hrs. Simple fact is I'm running a 2.4GHz processor overclocked 10% and can't go any faster. So someone with more speed can have a go and maybe finish it earlier.
ID: 62245 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3315
Credit: 519,939,926
RAC: 22,722
Message 62250 - Posted: 3 Sep 2014, 12:41:03 UTC - in response to Message 62245.  

Well I decided I'm aborting the ps_nbody. Its due 7/09 and still calculates to 177hrs processing even though it says it will be finished in 2 hrs. Simple fact is I'm running a 2.4GHz processor overclocked 10% and can't go any faster. So someone with more speed can have a go and maybe finish it earlier.


Yeah it happens, work units are not identical, otherwise why are we 'looking' at them, your unit will just get sent to someone else, and if that happens too often the project itself may even have to give it a go. And even then it could just turn out to be too full of static or junk and be uncrunchable, it happens sometimes, fortunately rarely but....
ID: 62250 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile John Black

Send message
Joined: 3 May 10
Posts: 74
Credit: 1,532,760
RAC: 0
Message 62252 - Posted: 3 Sep 2014, 23:09:09 UTC - in response to Message 62207.  

Good call Mikey. I have followed your instructions and now I am getting tasks that I CAN calculate. No more aborted tasks I am happy and MW@H is happy.
ID: 62252 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3315
Credit: 519,939,926
RAC: 22,722
Message 62253 - Posted: 4 Sep 2014, 11:04:05 UTC - in response to Message 62252.  
Last modified: 4 Sep 2014, 11:06:05 UTC

Good call Mikey. I have followed your instructions and now I am getting tasks that I CAN calculate. No more aborted tasks I am happy and MW@H is happy.


I am glad it is working for you, I guess I need to keep an eye in the mirror for you catching me now...note to self don't help so much!! Just kidding if you catch and pass me the Project benefits and that TOO is a good thing.
ID: 62253 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Greg Tippitt

Send message
Joined: 8 Apr 11
Posts: 3
Credit: 46,155,607
RAC: 0
Message 62306 - Posted: 10 Sep 2014, 4:56:52 UTC - in response to Message 62167.  

Your CPU has 4 full cores and the other 4 Hyper-threaded ones. I have only one machine with an Intel hyper-threaded CPU, and the performance of the hyper-thread can vary widely. For some stuff it performs just like a dual core, but for other stuff, the second thread seems to help much less. The best I've come up with, it seems that small repetitive junks of code, like benchmarks, that stay within the L1 cache will produce results almost the same as a full dual core.

For programs that have long set of instructions being fetched from outside the cache, the secondary thread helps by getting the next instruction ready while the main core is executing. When you get two long sets of instructions being run on both a main core and its secondary thread, the performance for me was not close to that of two full cores. My experience with this was not with BOINC apps, but with some statistical programming I was doing a some time ago. Intel improved the hyper-threading with later releases, and compilers have also improved their optimization for hyper-threading, so results can vary.

In general, Intel's CPUs are faster at floating point math than AMDs, but AMD's real Hex cores are sometimes faster than Intel's hyper-quad 8's. My experience with N-Body tasks is that they seldom take more than 3 hours on my machines, which have 4 hex core AMD Opteron CPUs. The OpenMP multi-threading used in the N-Body tasks is limited to only using 16 of my 24 cores, which is fine since I normally keep BOINC using only 20 of the 24 cores, so that the other 4 cores are available for IO, system overhead, and feeding data to GPU tasks running on the machine as well. With all 24 cores in use, the contention for resources was so high that overall CPU utilization was lower with 24 tasks than with 20 task, since many of the 24 tasks were in wait states much of the time.

Similarly your hyper-threaded tasks my be interrupting tasks on the main core.

For example I use the cc_config.xml file below. You could try the "ncpu" option and let it run tasks for a day at 8 and compare the run times with 4, or something in between. To compare the run times with different settings use the following formula:

RunTime / threads(8) * ncpu(4)

If your run times are something like 8 hours with ncpu = 8
and 4 hours with ncpu = 4 then you are better off with using all 8 threads.

If your run time with ncpu = 4 is less than half the time as when using 8 threads, then you are better off with the lower ncpu settings.

Multi-threaded performance estimation of far from straightforward. If your 8 threads were all full cores at 3.4ghz, your machine would likely run faster than my 24 cores at 2.4 ghz. For a single task, yours is almost 50% faster, but for 20 separate tasks running, mine would complete more tasks in a given time. For multi-threaded tasks like the n-body tasks, the results are less clear.

For this reason sometimes the result validation is not as clear as with other BOINC tasks, and it takes some time before you get credit on n-body tasks. On the other hand, once you get your system configured, n-body multi-threaded tasks can eat lots of data quickly. It's the same principle that lets GPU tasks run so quickly. Some work is not easily ported to OpenCl on GPUs, which have limited instruction capability compared to the CPUs. These OpenMP tasks like n-body, allow the programers to use all of the complex instructions of the CPU, and line up multiple threads knocking over dominoes quickly like a GPU with more limited instruction sets.

Good Luck,
Greg

<cc_config>
<options>
<allow_remote_gui_rpc>1</allow_remote_gui_rpc>
<use_all_gpus>1</use_all_gpus>
<fetch_on_update>1</fetch_on_update>
<exclusive_app>synaptic</exclusive_app>
<exclusive_gpu_app>vlc</exclusive_gpu_app>
<start_delay>60</start_delay>
<ncpus>20</ncpus>
</options>
</cc_config>
ID: 62306 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
WezH

Send message
Joined: 2 Feb 09
Posts: 3
Credit: 1,443,819
RAC: 0
Message 62461 - Posted: 3 Oct 2014, 17:49:56 UTC
Last modified: 3 Oct 2014, 18:00:39 UTC

It seems that there are some workunits which do run very long time.

Workunit 624805106 did run over in this host in approx 174h CPU time. Completed, validation inconclusive.

Workunit 624880629 is still running in this host, runtime so far about 105h CPU time, 83% done.

Another one, 840209104, so far 2.5% in 2h45m at this host

What is wrong with these MT units?

Or should I just abort them when I see it is gonna be long one?
ID: 62461 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
swiftmallard
Avatar

Send message
Joined: 18 Jul 09
Posts: 300
Credit: 303,562,776
RAC: 0
Message 62462 - Posted: 3 Oct 2014, 19:49:21 UTC - in response to Message 62461.  


What is wrong with these MT units?

Or should I just abort them when I see it is gonna be long one?

My experience was that it was a bit of a gamble. I have seen them run for a very long time, and then validate, and run a long time just to error out. I got to the point where if they ran longer than a day I would just abort them, regardless of the estimated time to completion.
ID: 62462 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : N-Body long processing time

©2024 Astroinformatics Group