Welcome to MilkyWay@home

Long crunch time on new N-Body simulations?


Advanced search

Message boards : Number crunching : Long crunch time on new N-Body simulations?
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Bill
Avatar

Send message
Joined: 8 Jan 18
Posts: 35
Credit: 11,139,026
RAC: 67,898
10 million credit badge2 year member badge
Message 68734 - Posted: 16 May 2019, 12:15:14 UTC

I've noticed I have a few N-Body tasks that are taking way longer than I had experienced in the past. What would normally take a few hours is creeping into taking 24 hours or more.

Not all of my N-Body tasks are this way, but a few of them are here and here.

I'm letting them crunch through. They don't seem to be "stuck", just taking awhile. They should be able to complete before the deadline, assuming no delays. I just wanted to see if anyone else is experiencing this.
ID: 68734 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 9 Jul 17
Posts: 63
Credit: 9,826,922
RAC: 796
5 million credit badge2 year member badge
Message 68736 - Posted: 16 May 2019, 13:57:39 UTC - in response to Message 68734.  

Yes, I saw it. I think it is normal.
ID: 68736 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 357
Credit: 16,320,358
RAC: 0
10 million credit badge9 year member badge
Message 68743 - Posted: 17 May 2019, 19:57:11 UTC - in response to Message 68734.  

Well, this isn't very surprising since the new application is using only one single core.
.
ID: 68743 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bill
Avatar

Send message
Joined: 8 Jan 18
Posts: 35
Credit: 11,139,026
RAC: 67,898
10 million credit badge2 year member badge
Message 68744 - Posted: 17 May 2019, 20:47:34 UTC - in response to Message 68743.  

Well, this isn't very surprising since the new application is using only one single core.
Ok, true, but I think the estimated completion time is being under-estimated. I had some tasks that took 24-26 hours to complete, but I think they were originally estimated at 11 hours or so. I didn't pay close enough attention to know for sure if this is the case. I'll have to check upcoming tasks to see if this is really the case or not.
ID: 68744 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
marmot
Avatar

Send message
Joined: 12 Dec 15
Posts: 43
Credit: 7,028,543
RAC: 0
5 million credit badge4 year member badge
Message 68752 - Posted: 19 May 2019, 3:52:19 UTC - in response to Message 68744.  

Well, this isn't very surprising since the new application is using only one single core.
Ok, true, but I think the estimated completion time is being under-estimated. I had some tasks that took 24-26 hours to complete, but I think they were originally estimated at 11 hours or so. I didn't pay close enough attention to know for sure if this is the case. I'll have to check upcoming tasks to see if this is really the case or not.


BOINC client keeps a running average of completion times to estimate completion and the old runtimes outweigh the new runtimes in the average.

I think if you set, in cc_config.xml, <rec_half_life_days>0</rec_half_life_days> then restart BOINC and run it for an hour then set it back to default 10 days <rec_half_life_days>10</rec_half_life_days>, you'll reset the running averages (of all WU's) and it should be close to the right number in 24 hours.

I don't usually worry about the estimate (it's usually always wrong) and so haven't tested this.
https://boinc.berkeley.edu/wiki/Client_configuration
ID: 68752 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileKeith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 310
Credit: 159,299,724
RAC: 422,157
100 million credit badge9 year member badgeextraordinary contributions badge
Message 68756 - Posted: 19 May 2019, 18:12:41 UTC - in response to Message 68752.  

BOINC client keeps a running average of completion times to estimate completion and the old runtimes outweigh the new runtimes in the average.

I think if you set, in cc_config.xml, <rec_half_life_days>0</rec_half_life_days> then restart BOINC and run it for an hour then set it back to default 10 days <rec_half_life_days>10</rec_half_life_days>, you'll reset the running averages (of all WU's) and it should be close to the right number in 24 hours.

I don't usually worry about the estimate (it's usually always wrong) and so haven't tested this.


This is the correct way to update the estimated times. Or if you want an estimate that changes more rapidly based on a quickly changing data mix, set <rec_half_life_days>10</rec_half_life_days> to <rec_half_life_days>1</rec_half_life_days> and your estimates will only average over the last day.

That is what I run on my clients since Seti has a fairly diverse data mix that changes daily.
ID: 68756 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bill
Avatar

Send message
Joined: 8 Jan 18
Posts: 35
Credit: 11,139,026
RAC: 67,898
10 million credit badge2 year member badge
Message 68759 - Posted: 19 May 2019, 23:01:26 UTC

Thanks for the input Marmot and Keith. The reason I noticed this was because I am running MW, Einstein, and Seti all on the same machine. Einstein also has some tasks that are taking about 10 hours. I had several tasks from MW and Einstein that needed a lot of compute time and their due dates were approaching.

I suspect now that I just had the store a minimum/additional settings a little high. Compounding that with two projects with newer tasks that take longer computing times from before probably put me in that concern. I had the settings at 1 and 5, but I have dialed it back to 0.5 and 2. We’ll see how it goes from there.
ID: 68759 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Hal Bregg

Send message
Joined: 28 Dec 18
Posts: 12
Credit: 1,066,563
RAC: 633
1 million credit badge1 year member badge
Message 68760 - Posted: 20 May 2019, 8:31:48 UTC - in response to Message 68743.  
Last modified: 20 May 2019, 8:39:46 UTC

[Deleted]
ID: 68760 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bill
Avatar

Send message
Joined: 8 Jan 18
Posts: 35
Credit: 11,139,026
RAC: 67,898
10 million credit badge2 year member badge
Message 68862 - Posted: 17 Jun 2019, 12:55:02 UTC

So, I think something is going on here. I currently have four N-Body 1.76 tasks running that so far have elapsed 2 or 3 DAYS, with most of those having 10+ hours to remain. These tasks were downloaded over a week ago (6 June 2019, 9 June 2019), and I am pretty sure I would have noticed estimated times of that long. Marmot, I did try adjusting the half life as you suggested and it had not picked up this discrepancy in ETA. This morning I have adjusted my half life setting down to 1.

I feel that this is a problem that shouldn't be happening. The CPU, despite being a laptop, is from 2017 so it isn't a slow processor. It has a higher GFLOPS/core than an i7-8700. Is there any kind of debugging I can do to evaluate this?
ID: 68862 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
swiftmallard
Avatar

Send message
Joined: 18 Jul 09
Posts: 299
Credit: 303,442,173
RAC: 0
300 million credit badge10 year member badgeextraordinary contributions badge
Message 68945 - Posted: 2 Aug 2019, 22:03:05 UTC - in response to Message 68862.  

So, I think something is going on here. I currently have four N-Body 1.76 tasks running that so far have elapsed 2 or 3 DAYS, with most of those having 10+ hours to remain. These tasks were downloaded over a week ago (6 June 2019, 9 June 2019), and I am pretty sure I would have noticed estimated times of that long. Marmot, I did try adjusting the half life as you suggested and it had not picked up this discrepancy in ETA. This morning I have adjusted my half life setting down to 1.

I feel that this is a problem that shouldn't be happening. The CPU, despite being a laptop, is from 2017 so it isn't a slow processor. It has a higher GFLOPS/core than an i7-8700. Is there any kind of debugging I can do to evaluate this?

My aging dual core AMD Phenom II is processing N-Body 1.76 tasks in just a few hours, all have been pretty close to the estimated time.
ID: 68945 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileKeith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 310
Credit: 159,299,724
RAC: 422,157
100 million credit badge9 year member badgeextraordinary contributions badge
Message 68946 - Posted: 3 Aug 2019, 3:05:28 UTC - in response to Message 68945.  

When the run_time greatly exceeds the cpu_time, that indicates a cpu that is overcommitted. Try running fewer tasks. Or reduce the number of background processes that are stealing cpu cycles from the crunching.
ID: 68946 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bill
Avatar

Send message
Joined: 8 Jan 18
Posts: 35
Credit: 11,139,026
RAC: 67,898
10 million credit badge2 year member badge
Message 68979 - Posted: 19 Aug 2019, 15:49:08 UTC

Sorry for the late response, but I think I know the culprit. I was crunching Intel GPU tasks for Seti@home and that slowed everything down. I don't know why I didn't figure that in the first place. Times have sped up significantly since I stopped with the iGPU tasks.
ID: 68979 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bill
Avatar

Send message
Joined: 8 Jan 18
Posts: 35
Credit: 11,139,026
RAC: 67,898
10 million credit badge2 year member badge
Message 69139 - Posted: 29 Sep 2019, 22:57:06 UTC

I'm back. Same problem, different computer (located here). All my MW N-body 1.76 task that haven't started have an ETA of about 50 minutes (minimum) or 3 hours (maximum). Today, I have two that are running, both about 10 hours completed, but now they show completion times between 15 and 19 hours. When I checked on this computer last night I had NNT set for MW, and none of the Nbody tasks had an ETA of that long. I also had a task that had an ETA of about 2 hours when I started it earlier today, and now the ETA is about 20 hours remaining after running 6.5 hours. It appears that the initial ETA for these tasks are under-estimated by a factor of eight!

I'm concerned that two things are happening here. First, that the amount of N-body tasks I downloaded are too many for my CPU to crunch in the two week timeframe (even ignoring other projects). Secondly, I'm afraid the other deadlines will pass for the other projects (i.e., MW) because my computer would have spent too much time on the N-body tasks. Basically, my suspicion is that with under-estimated ETAs, Boinc will download more tasks than my computer is capable of crunching in the alloted time.

I currently have my computer set to suspended any Seti & Einstein tasks, and I have suspended all non N-body tasks for MW. I also have Einstein & MW set to NNT (Seti is just plain old suspended...their deadlines are very generous). So, all four cores are crunching MW all the time, and no GPUs are running to complicate/burden the CPU tasks. I'm hoping by exposing under estimated MW tasks early I can get them done in time for ALL tasks to be completed before their deadline.

Is there a way to get the initial ETA estimation set higher for these tasks? Without knowing the code, I'm guessing this is something that should be sorted out on the server end, not the user end.
ID: 69139 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bill
Avatar

Send message
Joined: 8 Jan 18
Posts: 35
Credit: 11,139,026
RAC: 67,898
10 million credit badge2 year member badge
Message 69156 - Posted: 5 Oct 2019, 3:55:53 UTC

I don't think I'm the only one that has had this problem. Check out https://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=1796656112. Two tasks timed out while computing, one didn't start before the deadline, and only one task completed.
ID: 69156 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Nick Name

Send message
Joined: 27 Jul 14
Posts: 21
Credit: 420,478,436
RAC: 1,268,093
300 million credit badge5 year member badge
Message 69157 - Posted: 5 Oct 2019, 16:42:27 UTC - in response to Message 69139.  
Last modified: 5 Oct 2019, 16:42:46 UTC

Is there a way to get the initial ETA estimation set higher for these tasks? Without knowing the code, I'm guessing this is something that should be sorted out on the server end, not the user end.

The best way to handle this is lower your cache (aka Store at least N days of work in your preferences), at least until the estimates are more accurate. I don't usually set mine to more than half a day, and I set it that low precisely because of seeing this problem.
Team USA forum | Team USA page
Always crunching / Always recruiting
ID: 69157 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bill
Avatar

Send message
Joined: 8 Jan 18
Posts: 35
Credit: 11,139,026
RAC: 67,898
10 million credit badge2 year member badge
Message 69159 - Posted: 5 Oct 2019, 18:51:38 UTC - in response to Message 69157.  

That's fair, but when my settings are a minimum of 2 days, 5 days additional, how is it I can't complete tasks within a 2 week deadline? The computer runs 24/7 crunching, 100% of the CPU time and all cores. I am not restricting boinc from attempting to complete tasks at all.
ID: 69159 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Nick Name

Send message
Joined: 27 Jul 14
Posts: 21
Credit: 420,478,436
RAC: 1,268,093
300 million credit badge5 year member badge
Message 69161 - Posted: 6 Oct 2019, 15:02:10 UTC - in response to Message 69159.  

My guess is that the combination of work now taking a lot longer than before, plus multi-threaded work running again has really confused things. The client is also bad at accounting for anything in app_config that might affect run times, specifically max_concurrent, so if you are using those that might be part of the problem. Regardless, a low cache is the only way to keep from getting too much work.
Team USA forum | Team USA page
Always crunching / Always recruiting
ID: 69161 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
swiftmallard
Avatar

Send message
Joined: 18 Jul 09
Posts: 299
Credit: 303,442,173
RAC: 0
300 million credit badge10 year member badgeextraordinary contributions badge
Message 69162 - Posted: 6 Oct 2019, 15:17:54 UTC - in response to Message 69159.  

That's fair, but when my settings are a minimum of 2 days, 5 days additional, how is it I can't complete tasks within a 2 week deadline? The computer runs 24/7 crunching, 100% of the CPU time and all cores. I am not restricting boinc from attempting to complete tasks at all.

Very few people can crunch both n-body and separation WUs on the same computer. The change from single thread to multi thread n-body work is only going to make it even more difficult to deal with the scheduler.
ID: 69162 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bill
Avatar

Send message
Joined: 8 Jan 18
Posts: 35
Credit: 11,139,026
RAC: 67,898
10 million credit badge2 year member badge
Message 69163 - Posted: 8 Oct 2019, 1:04:22 UTC - in response to Message 69162.  

Well for now I'm going to just stick with the separation task and not crunch n-body. I only received a few 4 CPU n-body tasks, the rest were 1 core tasks. This seems more like a bug than a feature to me. IMHO, this would be a deterrent to people that just want to "set and forget" Boinc. However, I think I've stood on this soap box for long enough, so I'll drop this for now unless someone else wants to address this.
ID: 69163 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
rbpeake

Send message
Joined: 29 Aug 07
Posts: 19
Credit: 421,261
RAC: 0
100 thousand credit badge10 year member badge
Message 69232 - Posted: 6 Nov 2019, 23:30:32 UTC

Is there a difference in runtimes between the work units? A recent 4-core unit took longer than a 3-core unit!
Regards,
Bob P.
ID: 69232 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Long crunch time on new N-Body simulations?

©2020 Astroinformatics Group