nbody fpops calculation updated
log in

Advanced search

Message boards : News : nbody fpops calculation updated

Author Message
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0

Message 59180 - Posted: 1 Jul 2013, 2:44:34 UTC

It looks like our fpops calculation for the nbody simulation is way too high, and was causing problems. I've updated the work generator so newly generated workunits should have an estimated fpops of aproximately 100x less. Let us know how they're working.

--Travis
____________

Alinator
Send message
Joined: 7 Jun 08
Posts: 464
Credit: 56,639,936
RAC: 0

Message 59184 - Posted: 1 Jul 2013, 13:58:37 UTC

Were you talking about problems like this?

Jeffery M. Thompson
Volunteer moderator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 23 Sep 12
Posts: 145
Credit: 10,781,192
RAC: 6,350

Message 59186 - Posted: 1 Jul 2013, 15:52:26 UTC

These work units appear to be before Travis made the changes.

It appears the errors are the stack overflow you reported earlier that we are working on getting the patch out for currently.

The change in fpops_est should change the estimated time we send out on new work units. This will happen on work units generated since Travis made the change. So the Work units will be dated July 1 st in UTC.

Jeff

jdzukley
Send message
Joined: 26 May 11
Posts: 29
Credit: 16,639,739
RAC: 824

Message 59194 - Posted: 2 Jul 2013, 13:53:15 UTC
Last modified: 2 Jul 2013, 13:56:19 UTC

Thanks for the update on the estimated time to complete estimates. Per my observations, they are much improved. The only very variance is that some tasks do take 5-6 time longer than estimated. This is +/- insignificant as the original estimate my be 2.5 minutes, with actual run times of 15 minutes.

I have a question based on observing total installed CPU processor efficiencies. For this statement, I am basing my comments on a computer that has 12 cores. Most MT tasks run at an average of +/- 80% total efficiency as observed using the Windows Resource Manager. Therefore logic suggest that it would be better to run 12 separate nbody tasks at +/- 100% rather than a single MT task at 80%? Consider that I also have 2 GPU cards installed, and with those cards running at 95 +/- GPU loads (as observed with GPU Z) it is rare that the total CPU machine loads get above 85%.

I look forward to any discussion about MT tasks verses single run tasks.

I suppose the answer is to compare run time averages verses CPU seconds used. Is there any current review of these results?
____________

Jeffery M. Thompson
Volunteer moderator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 23 Sep 12
Posts: 145
Credit: 10,781,192
RAC: 6,350

Message 59199 - Posted: 2 Jul 2013, 18:55:09 UTC

On the first part of the estimate being off.
We changed the calculation to get a better order of magnitude estimate.
We have the data to fine tune this the hard part is getting it to be mostly right as the input parameters have a large spread that they can run across to be correct.
We suspect we will go through one or two more iterations of this each being a fine grained improvement each time.
We hope our estimates are closer and helping with other scheduling issues.

As for the second part some work units will generate more force equations than others.
When there are loads of force equations mt will be better.

So not all work units are the same and in aggregate it does appear mt is better than non mt.
Though we are looking at optimizing how we set up initial conditions to improve efficiency.

Jeff

DJStarfox
Send message
Joined: 29 Sep 10
Posts: 53
Credit: 844,441
RAC: 2

Message 59229 - Posted: 5 Jul 2013, 4:01:39 UTC - in response to Message 59180.
Last modified: 5 Jul 2013, 4:04:40 UTC

Does this explain why I was given so much credit for a few workunits earlier? I posted about it, but no one has shed any light on the observation.
http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=3277

NOTE: All those WU have been deleted from the server. However, if you look at my credit, you'll see my MW credits more than doubled in 1 day. This was due to completing a single WU.
http://www.allprojectstats.com/showuser.php?projekt=61&id=126741

Edit: I'm also reading about credit system problems from June. Perhaps something is going on with the credit system?
http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=3294

Jeffery M. Thompson
Volunteer moderator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 23 Sep 12
Posts: 145
Credit: 10,781,192
RAC: 6,350

Message 59230 - Posted: 5 Jul 2013, 5:56:10 UTC
Last modified: 5 Jul 2013, 12:22:11 UTC

On these work units you posted earlier that was before we changed the calculations so they would have had the same calculation as before.

The change happened on July 1st UTC it was around 8 pm June 30 EST.
So only work units created after that would have a different estimate of fpops.


As for the credit system. we was made aware of the 0.64 credit work units and we are looking at why this is happening. So we don't have an answer yet. There are examples on the pre changes to the fpops estimate change that we are looking at but we are also trying to monitor the current work units.

kurt
Send message
Joined: 17 Nov 10
Posts: 12
Credit: 195,554,899
RAC: 152,228

Message 59233 - Posted: 5 Jul 2013, 13:33:25 UTC

I still get computation errors on one of the computers that is running 2 6950's.However, the other running a single 7850 has never had a problem.Any ideas?Kurt

Jeffery M. Thompson
Volunteer moderator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 23 Sep 12
Posts: 145
Credit: 10,781,192
RAC: 6,350

Message 59235 - Posted: 5 Jul 2013, 13:48:24 UTC

If you could post a work unit or two we could look at stderr returned.

Most likely it is the stack overflow that we have a patch in testing for and we are trying to get out asap.

It could just be a difference in resources between the two machines before the issue is exposed.

Though with more details we can be more definitive.


Jeff

Alinator
Send message
Joined: 7 Jun 08
Posts: 464
Credit: 56,639,936
RAC: 0

Message 59238 - Posted: 5 Jul 2013, 14:36:10 UTC - in response to Message 59235.

Hmmm...

I've been looking over Kurt's host, and his is looking more like a driver kernel mode GPF to me.


Post to thread

Message boards : News : nbody fpops calculation updated


Main page · Your account · Message boards


Copyright © 2017 AstroInformatics Group