Message boards :
News :
nbody fpops calculation updated
Message board moderation
Author | Message |
---|---|
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
It looks like our fpops calculation for the nbody simulation is way too high, and was causing problems. I've updated the work generator so newly generated workunits should have an estimated fpops of aproximately 100x less. Let us know how they're working. --Travis |
Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0 |
Were you talking about problems like this? |
Send message Joined: 23 Sep 12 Posts: 159 Credit: 16,977,106 RAC: 0 |
These work units appear to be before Travis made the changes. It appears the errors are the stack overflow you reported earlier that we are working on getting the patch out for currently. The change in fpops_est should change the estimated time we send out on new work units. This will happen on work units generated since Travis made the change. So the Work units will be dated July 1 st in UTC. Jeff |
Send message Joined: 26 May 11 Posts: 32 Credit: 43,959,896 RAC: 0 |
Thanks for the update on the estimated time to complete estimates. Per my observations, they are much improved. The only very variance is that some tasks do take 5-6 time longer than estimated. This is +/- insignificant as the original estimate my be 2.5 minutes, with actual run times of 15 minutes. I have a question based on observing total installed CPU processor efficiencies. For this statement, I am basing my comments on a computer that has 12 cores. Most MT tasks run at an average of +/- 80% total efficiency as observed using the Windows Resource Manager. Therefore logic suggest that it would be better to run 12 separate nbody tasks at +/- 100% rather than a single MT task at 80%? Consider that I also have 2 GPU cards installed, and with those cards running at 95 +/- GPU loads (as observed with GPU Z) it is rare that the total CPU machine loads get above 85%. I look forward to any discussion about MT tasks verses single run tasks. I suppose the answer is to compare run time averages verses CPU seconds used. Is there any current review of these results? |
Send message Joined: 23 Sep 12 Posts: 159 Credit: 16,977,106 RAC: 0 |
On the first part of the estimate being off. We changed the calculation to get a better order of magnitude estimate. We have the data to fine tune this the hard part is getting it to be mostly right as the input parameters have a large spread that they can run across to be correct. We suspect we will go through one or two more iterations of this each being a fine grained improvement each time. We hope our estimates are closer and helping with other scheduling issues. As for the second part some work units will generate more force equations than others. When there are loads of force equations mt will be better. So not all work units are the same and in aggregate it does appear mt is better than non mt. Though we are looking at optimizing how we set up initial conditions to improve efficiency. Jeff |
Send message Joined: 29 Sep 10 Posts: 54 Credit: 1,386,559 RAC: 0 |
Does this explain why I was given so much credit for a few workunits earlier? I posted about it, but no one has shed any light on the observation. http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=3277 NOTE: All those WU have been deleted from the server. However, if you look at my credit, you'll see my MW credits more than doubled in 1 day. This was due to completing a single WU. http://www.allprojectstats.com/showuser.php?projekt=61&id=126741 Edit: I'm also reading about credit system problems from June. Perhaps something is going on with the credit system? http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=3294 |
Send message Joined: 23 Sep 12 Posts: 159 Credit: 16,977,106 RAC: 0 |
On these work units you posted earlier that was before we changed the calculations so they would have had the same calculation as before. The change happened on July 1st UTC it was around 8 pm June 30 EST. So only work units created after that would have a different estimate of fpops. As for the credit system. we was made aware of the 0.64 credit work units and we are looking at why this is happening. So we don't have an answer yet. There are examples on the pre changes to the fpops estimate change that we are looking at but we are also trying to monitor the current work units. |
Send message Joined: 17 Nov 10 Posts: 12 Credit: 482,514,682 RAC: 5,708 |
I still get computation errors on one of the computers that is running 2 6950's.However, the other running a single 7850 has never had a problem.Any ideas?Kurt |
Send message Joined: 23 Sep 12 Posts: 159 Credit: 16,977,106 RAC: 0 |
If you could post a work unit or two we could look at stderr returned. Most likely it is the stack overflow that we have a patch in testing for and we are trying to get out asap. It could just be a difference in resources between the two machines before the issue is exposed. Though with more details we can be more definitive. Jeff |
Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0 |
Hmmm... I've been looking over Kurt's host, and his is looking more like a driver kernel mode GPF to me. |
©2024 Astroinformatics Group