New WU Length?

Author	Message
Thunder Send message Joined: 9 Jul 08 Posts: 85 Credit: 44,842,651 RAC: 0	Message 4189 - Posted: 16 Jul 2008, 17:12:12 UTC - in response to Message 4187. My antique Powermac G4 has taken 14 hours to get 20% through a WU; at this rate, it will never finish the queued WU's that used to take only 20 minutes each. My speedy dual-core PC, on the other hand, whips through each in only 5 hours. Looks like I'll have to switch my Mac to another BOINC project. I wouldn't say it's necessary to drop MW from the Mac. Once BOINC adjusts itself to the new run times, it will likely only download one or two tasks at a time. This process will be complete much quicker if the project admins change the estimated run time on the WU's, but BOINC is capable of adjusting for itself even if they don't. When you get WU's that are nearly past deadline, just abort them and the computer should download 1 or 2 tasks and you'll be good. ID: 4189 · Rating: 0 · rate: / Reply Quote

Jayargh Send message Joined: 8 Oct 07 Posts: 289 Credit: 3,690,838 RAC: 0	Message 4190 - Posted: 16 Jul 2008, 17:20:46 UTC - in response to Message 4189. My antique Powermac G4 has taken 14 hours to get 20% through a WU; at this rate, it will never finish the queued WU's that used to take only 20 minutes each. My speedy dual-core PC, on the other hand, whips through each in only 5 hours. Looks like I'll have to switch my Mac to another BOINC project. I wouldn't say it's necessary to drop MW from the Mac. Once BOINC adjusts itself to the new run times, it will likely only download one or two tasks at a time. This process will be complete much quicker if the project admins change the estimated run time on the WU's, but BOINC is capable of adjusting for itself even if they don't. When you get WU's that are nearly past deadline, just abort them and the computer should download 1 or 2 tasks and you'll be good. Adding to what Thunder said.....Nate has already said that Travis will increase the deadlines soon,so these short deadlines won't last long ;) ID: 4190 · Rating: 0 · rate: / Reply Quote

Kevint Send message Joined: 22 Nov 07 Posts: 285 Credit: 1,076,786,368 RAC: 0	Message 4191 - Posted: 16 Jul 2008, 17:28:17 UTC Last modified: 16 Jul 2008, 18:06:20 UTC Biggest problem I am seeing with the larger WU's are TIMEOUT! - One the hosts that I have left attached to the project, "they" are used to 6 min WU's so now all the WU's are aborting I accidentally detached, - my bad. - Host is re-attached now. <core_client_version>5.10.30</core_client_version> <![CDATA[ <message> Maximum CPU time exceeded </message> <stderr_txt> http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=20612 ID: 4191 · Rating: 0 · rate: / Reply Quote

Alinator Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0	Message 4192 - Posted: 16 Jul 2008, 18:27:50 UTC What I have been doing to address that is to go in and hand edit the FPOP estimate and limits for the 3720, 21, and 30's tasks my slow hosts picked by a factor of 60 for the 20 and 21's, and a factor of 30 for the 30's. That should take care of most of the issues, as well as not requiring a big adjustment to TDCF. That's desirable since if you pick up a timeout old task on a resend after the massive TDCF upshift as it stands, the corresponding way high runtime estimate for the old task will tend to shut down work DL's for MW until it clears the queue. This shouldn't be much of a problem, but would cause seemingly anomalous BOINC behaviour to the casual observer. Alinator ID: 4192 · Rating: 0 · rate: / Reply Quote

Kevint Send message Joined: 22 Nov 07 Posts: 285 Credit: 1,076,786,368 RAC: 0	Message 4193 - Posted: 16 Jul 2008, 18:57:57 UTC - in response to Message 4192. What I have been doing to address that is to go in and hand edit the FPOP estimate and limits for the 3720, 21, and 30's tasks my slow hosts picked by a factor of 60 for the 20 and 21's, and a factor of 30 for the 30's. That should take care of most of the issues, as well as not requiring a big adjustment to TDCF. That's desirable since if you pick up a timeout old task on a resend after the massive TDCF upshift as it stands, the corresponding way high runtime estimate for the old task will tend to shut down work DL's for MW until it clears the queue. This shouldn't be much of a problem, but would cause seemingly anomalous BOINC behaviour to the casual observer. Alinator HUH ? where would one go to "hand edit" the FPOP estimate and limits for what? the 3720 ?, 21 and 30's. maybe it would just be easier for the project admins to adjust the time to crunch factor on the WU's ?? ID: 4193 · Rating: 0 · rate: / Reply Quote

Nathan Project scientist Send message Joined: 4 Oct 07 Posts: 43 Credit: 53,898 RAC: 0	Message 4196 - Posted: 16 Jul 2008, 19:37:44 UTC - in response to Message 4190. Adding to what Thunder said.....Nate has already said that Travis will increase the deadlines soon,so these short deadlines won't last long ;) Yeah, we'll be making some changes to the deadlines and number of WUs downloaded at a time ASAP. We'll just have to deal with it until then, sorry. ~Nate~ ID: 4196 · Rating: 0 · rate: / Reply Quote

Alinator Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0	Message 4197 - Posted: 16 Jul 2008, 19:41:51 UTC - in response to Message 4193. Last modified: 16 Jul 2008, 19:50:41 UTC HUH ? where would one go to "hand edit" the FPOP estimate and limits for what? the 3720 ?, 21 and 30's. maybe it would just be easier for the project admins to adjust the time to crunch factor on the WU's ?? You have to go in and edit the relevant elements for the tasks in the <workunit> sections for the tasks in client_state. And yes, it would be easier to fix the parameters on the project side, but that doesn't help any for the work on board already. ;-) I suggest that if you only crunch MW, don't bother. However, when changes like this happen for hosts which run more than one project or have high bias resource shares, it can have unintended (and undesired) consequences for the other projects. My main point is there are manual workarounds to deal with the fallout, and mitigate the effects. <edit> I see Nathan is around. :-) Like I said above this is not a major problem, but just wanted to let people know that you can take command and work out the situation with a little thought and effort on your part, without wholesale work dumping, resets, or detaches. Alinator ID: 4197 · Rating: 0 · rate: / Reply Quote

Kevint Send message Joined: 22 Nov 07 Posts: 285 Credit: 1,076,786,368 RAC: 0	Message 4198 - Posted: 16 Jul 2008, 22:10:28 UTC - in response to Message 4197. Like I said above this is not a major problem, but just wanted to let people know that you can take command and work out the situation with a little thought and effort on your part, without wholesale work dumping, resets, or detaches. Alinator Well it is a problem when nearly all of the tasks abort .. 7/16/2008 3:42:15 PM\|Milkyway@home\|Aborting task gs_3721282_1216228552_2604228_0: exceeded CPU time limit 14784.428374 7/16/2008 3:45:00 PM\|Milkyway@home\|Aborting task gs_3720282_1216091394_3660_1: exceeded CPU time limit 14784.428374 ID: 4198 · Rating: 0 · rate: / Reply Quote

banditwolf Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0	Message 4199 - Posted: 16 Jul 2008, 22:40:36 UTC sounds like your boinc settings Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. ID: 4199 · Rating: 0 · rate: / Reply Quote

Alinator Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0	Message 4200 - Posted: 16 Jul 2008, 22:49:28 UTC Last modified: 16 Jul 2008, 22:51:40 UTC What kind of host is it (or provide a link to it)? Based on what I've seen, failing on Max Time Exceeded isn't a problem for P4 class (or equivalent) hosts and newer. All I can say at this point is none of tasks I modified have failed on my slugs, and they have far more than ~14 kSecs on the currently running ones. Ones been running for over 87K and the other for 20K. So I'd have to say that your choices are limited to fix the new work so it will run to completion manually, or NNT until Nathan and Travis can figure out and set the new BOINC parameters. @Banditwolf: User preferences settings wouldn't have any effect on a Max Time Exceeded failure. Alinator ID: 4200 · Rating: 0 · rate: / Reply Quote

Kevint Send message Joined: 22 Nov 07 Posts: 285 Credit: 1,076,786,368 RAC: 0	Message 4201 - Posted: 16 Jul 2008, 23:11:14 UTC - in response to Message 4200. Last modified: 16 Jul 2008, 23:12:22 UTC What kind of host is it (or provide a link to it)? Based on what I've seen, failing on Max Time Exceeded isn't a problem for P4 class (or equivalent) hosts and newer. Alinator Host was posted in a previous post, but here it is again. But this is not the only one. I have several,, all highly overclocked duo's or quads. http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=20612 I have modified the xml file per instructions since,, so I hope it corrects the problem. But I am having similar problems on several other hosts, and I just don't have to time to edit each WU's estimated time to complete on each host..... I will just have to let them crunch till they fail and loose the hours of time.... yea right! ID: 4201 · Rating: 0 · rate: / Reply Quote

banditwolf Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0	Message 4202 - Posted: 16 Jul 2008, 23:36:12 UTC - in response to Message 4200. @Banditwolf: User preferences settings wouldn't have any effect on a Max Time Exceeded failure. I was thinking of the option where you can switch between projects. (I don't use it though) Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. ID: 4202 · Rating: 0 · rate: / Reply Quote

Alinator Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0	Message 4203 - Posted: 17 Jul 2008, 0:09:55 UTC - in response to Message 4201. Last modified: 17 Jul 2008, 0:23:15 UTC Host was posted in a previous post, but here it is again. But this is not the only one. I have several,, all highly overclocked duo's or quads. http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=20612 I have modified the xml file per instructions since,, so I hope it corrects the problem. But I am having similar problems on several other hosts, and I just don't have to time to edit each WU's estimated time to complete on each host..... I will just have to let them crunch till they fail and loose the hours of time.... yea right! OOP's... Sorry, I noticed that after I had posted. After looking over what I can see on the Host Summary, this is kind of strange. I would not expect this host to have Max Time problems generally, even with the current parameters. What were your runtimes like on the old work? Also there seems to be some significant variation in the claimed credit, even though they were all 3720's and 21's and ran for virtually the same amount of time. One would expect that to be about the same since they are the same type of work. Curious. This might indicate there's another underlying latent issue on the hosts showing the behaviour. One which comes to mind here is being highly overclocked, it might be that the Benchmark values are causing the problem. Remeber that Max Time is set by using the FPOP estimate divided by the floating point benchmark and is calculated by the CC itself, IIRC. You might be able to work around that by setting the floating point manually to a somewhat lower value, and then have BOINC start with the 'Don't Run Benchmarks' option set. However, that doesn't account for the claimed credit discrepancy on the same type of work. My initial thought there leans to something causing a net CPU efficiency decrease. In any event this is an example of where poofing the tasks immediately sure makes it a lot harder to glean some info into what's going on, and I agree manually 'hacking' the input parameters in client_state is not a really viable alternative on fast hosts, even for the most determined, hard core cruncher. ;-) Alinator ID: 4203 · Rating: 0 · rate: / Reply Quote

Kevint Send message Joined: 22 Nov 07 Posts: 285 Credit: 1,076,786,368 RAC: 0	Message 4205 - Posted: 17 Jul 2008, 1:34:12 UTC Run times on the shorter WU's was approx 4.5-5 min each if I remember. I know there is a slight issue with the timings for the memory on this host as well as a slight (85c) heat problem. But the system is set to ignore the heat until the cores melt :) It runs other projects just fine, including QMC, and only started to abort MW with the longer WU's. After a manual edit, the WU's are now claiming 70 hours to run, so we should be fine and after BOINC adjusts itself to the WU times I think this host will run fine. But I am having the same issue on my other hosts as well - so, like I said - They will just end up aborting the WU's until BOINC makes the correction to the run time factors- normally about 30-40 WU's I think. ID: 4205 · Rating: 0 · rate: / Reply Quote

Alinator Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0	Message 4206 - Posted: 17 Jul 2008, 2:01:32 UTC OK, now it's starting to make sense. Given your runtimes recollection, it looks like with the current project set parameters and your current time and performance metrics for the host I looked at the Max Time calculation was coming out 3 to 4 kSecs short of what the new work needs. One other thing to keep in mind though. IIRC, TDCF is not updated when tasks fail. So the other hosts won't auto correct on their own. Since yours are fast hosts which go through a lot work (relatively speaking), I would suggest just bumping the TDCF up manually and then just live with the slow down correction once the team makes the project side parameter changes. Alinator ID: 4206 · Rating: 0 · rate: / Reply Quote

Alain Send message Joined: 31 Mar 08 Posts: 1 Credit: 31,485 RAC: 0	Message 4223 - Posted: 17 Jul 2008, 19:33:03 UTC - in response to Message 4109. <edit> I just read over Nathan's new thread today again, and it appears that a third batch was made which should only be about 5 times longer than the first two chugged out today. So expect some variations in times depending on what you draw. Alinator About 5 times longer to complete??? Make that more like 60 times... Before, my units completed in about 11 minuts on average, prety close to the estimate. Now, the estimates are of 10:55:12, or almost 11 hours. Minutes have become hours... With a queue of 20 units, it will take some 10 days to complete, if there is absolutely nothing else running. With deaslines of about 3 days, I will have to cancell most usits, about 12-15 out of each 20. ID: 4223 · Rating: 0 · rate: / Reply Quote

Alinator Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0	Message 4225 - Posted: 17 Jul 2008, 19:38:05 UTC Last modified: 17 Jul 2008, 19:38:42 UTC Yes, the original 5 to 10x multiplier stated was based on some very early reports when they first came out. I updated that later in the thread when it became apparent 30-60 was closer to the truth. As a side note, 30-60 seems to be holding for all my slugs except for the G3, which seems to be taking it more on the nose than the Intels or AMD's. Alinator ID: 4225 · Rating: 0 · rate: / Reply Quote

Jord Send message Joined: 30 Aug 07 Posts: 125 Credit: 207,206 RAC: 0	Message 4233 - Posted: 18 Jul 2008, 12:00:24 UTC May I request that if you're doing resends of tasks that their deadline is slightly longer than 12 hours away on these long ones? ;-) One of my PCs has just been busy churning out this Milkyway task for the past 13 hours on high priority. Not a clue if it was in on time, as I wasn't here to check it and it's been purged from the DB already. Jord. The BOINC FAQ Service. ID: 4233 · Rating: 0 · rate: / Reply Quote

niterobin Send message Joined: 11 Mar 08 Posts: 28 Credit: 818,194 RAC: 0	Message 4234 - Posted: 18 Jul 2008, 13:07:08 UTC - in response to Message 4126. Over here, the old work units were taking about 7 minutes and the newer ones (gs 3720282) are taking about 7 hours. That's a 60-fold increase by my reckoning. I've yet to crunch any of the shoter new work units; I'll report back on the times when I get some. I've had one of the shorter length work units on my Athlon - gs_3730382 series - and it finished in 3h 46m 39s, making it about 32 times longer than the old work units. My slowest machine - a Duron - is running the long work units at 12h 40m or thereabouts, which is about 57 to 58 times longer than the short ones. I've set the cache to two days on that box, which is keeping it nicely in work with a bit of leeway for the deadlines in case anything untoward happens. I've just had a quick glance back through the logs for both boxes, and there appears to be no problems with server contact whatsoever. So, it looks like everything is working nicely. Rob. ID: 4234 · Rating: 0 · rate: / Reply Quote

bel3pv Send message Joined: 4 Apr 08 Posts: 2 Credit: 21,246,900 RAC: 0	Message 4239 - Posted: 18 Jul 2008, 19:27:09 UTC Milkyway@home was a verry cool project with short wu's. Wel 7 minutes is maybe a litle bit to short but 10 hours is a little bit to long !!! 2 or 3 hours is a better choice i think (dont forget the slower pc's) ID: 4239 · Rating: 0 · rate: / Reply Quote