FLOPS Estimate

Author	Message
Paul D. Buck Send message Joined: 12 Apr 08 Posts: 621 Credit: 161,934,067 RAC: 0	Message 9680 - Posted: 4 Feb 2009, 22:12:03 UTC I think we still have an issue with the flops estimate. I had a long queue of tasks that was going to blow the deadline so I fiddled with the STD to force cleaning my queue. After I got all tasks done I reset the project and lo and behold I D/L about 10 tasks of stripe s20 and s21 with an estimated time to complete of 6:03 minutes ... These tasks on my Mac Pro take about 25 to 30 min to complete ... so the number seems to be off by at least a factor of 5 ... I know it will settle in when I do process them, but, this will obviously happen on each reset so ... Thought I would let you know what I just experienced ... YMMV :) ID: 9680 · Rating: 0 · rate: / Reply Quote

Alinator Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0	Message 9686 - Posted: 4 Feb 2009, 23:05:07 UTC - in response to Message 9680. Last modified: 4 Feb 2009, 23:06:14 UTC I think we still have an issue with the flops estimate. I had a long queue of tasks that was going to blow the deadline so I fiddled with the STD to force cleaning my queue. After I got all tasks done I reset the project and lo and behold I D/L about 10 tasks of stripe s20 and s21 with an estimated time to complete of 6:03 minutes ... These tasks on my Mac Pro take about 25 to 30 min to complete ... so the number seems to be off by at least a factor of 5 ... I know it will settle in when I do process them, but, this will obviously happen on each reset so ... Thought I would let you know what I just experienced ... YMMV :) Agreed. It is unwise to set the FPOP estimate to reflect the apparent power of the fastest hosts. The reason is that as you pointed out, a reset or new attach will start out with a TDCF of one, which will almost always result in an overfetch for the majority of hosts. Personally, I would have set the estimate to 2 or 3 E15. One other point, the bounds value should be set to something a bit higher than what the estimate is set to. This has causes problems with tasks aborting on time when they didn't have to on other projects. Alinator ID: 9686 · Rating: 0 · rate: / Reply Quote

Travis Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0	Message 9691 - Posted: 4 Feb 2009, 23:36:21 UTC - in response to Message 9686. I think we still have an issue with the flops estimate. I had a long queue of tasks that was going to blow the deadline so I fiddled with the STD to force cleaning my queue. After I got all tasks done I reset the project and lo and behold I D/L about 10 tasks of stripe s20 and s21 with an estimated time to complete of 6:03 minutes ... These tasks on my Mac Pro take about 25 to 30 min to complete ... so the number seems to be off by at least a factor of 5 ... I know it will settle in when I do process them, but, this will obviously happen on each reset so ... Thought I would let you know what I just experienced ... YMMV :) Agreed. It is unwise to set the FPOP estimate to reflect the apparent power of the fastest hosts. The reason is that as you pointed out, a reset or new attach will start out with a TDCF of one, which will almost always result in an overfetch for the majority of hosts. Personally, I would have set the estimate to 2 or 3 E15. One other point, the bounds value should be set to something a bit higher than what the estimate is set to. This has causes problems with tasks aborting on time when they didn't have to on other projects. Alinator I think the bound is set to 100x what the estimate is right now, so that shouldn't be a problem. I can up the estimate a bit but I think once things settle down it should be pretty accurate. ID: 9691 · Rating: 0 · rate: / Reply Quote

Paul D. Buck Send message Joined: 12 Apr 08 Posts: 621 Credit: 161,934,067 RAC: 0	Message 9694 - Posted: 5 Feb 2009, 0:01:20 UTC It may also be because I am using 5.10.45 on that machine (Mac Pro) because several of the other projects I want to work with seem resistant to chainging the def file that contains the id string that the new 6.x hosts use for the Intel Macs ... If you upgrade to the 6 series then not only can you not fetch work, you cannot trickle up, nor can you report tasks that you have completed on the now, illegal host ... I know it is carping, but, this is the kind of issue that bites you when you don't do engineering and just hack at the code ... ID: 9694 · Rating: 0 · rate: / Reply Quote

Alinator Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0	Message 9698 - Posted: 5 Feb 2009, 0:34:52 UTC - in response to Message 9691. Last modified: 5 Feb 2009, 0:36:58 UTC I think the bound is set to 100x what the estimate is right now, so that shouldn't be a problem. I can up the estimate a bit but I think once things settle down it should be pretty accurate. Ahhhh yes, the bounds is set to 100X now. Actually, I meant dividing the current estimates by 4 or 5. Remember that a TDCF of 1 means the task will take the same amount of time the estimate says. So if you set the FPOP estimate based on what the fastest or most optimized hosts can do then they will have TDCF's close to 1 and everyone else greater than one. This is the scenario which can lead to overfetches and blown deadlines when things change in terms of estimated runtime (like we just saw). Alinator ID: 9698 · Rating: 0 · rate: / Reply Quote