Message boards :
Number crunching :
New WU Length?
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
Send message Joined: 9 Jul 08 Posts: 85 Credit: 44,842,651 RAC: 0 |
My antique Powermac G4 has taken 14 hours to get 20% through a WU; at this rate, it will never finish the queued WU's that used to take only 20 minutes each. I wouldn't say it's necessary to drop MW from the Mac. Once BOINC adjusts itself to the new run times, it will likely only download one or two tasks at a time. This process will be complete much quicker if the project admins change the estimated run time on the WU's, but BOINC is capable of adjusting for itself even if they don't. When you get WU's that are nearly past deadline, just abort them and the computer should download 1 or 2 tasks and you'll be good. |
Send message Joined: 8 Oct 07 Posts: 289 Credit: 3,690,838 RAC: 0 |
My antique Powermac G4 has taken 14 hours to get 20% through a WU; at this rate, it will never finish the queued WU's that used to take only 20 minutes each. Adding to what Thunder said.....Nate has already said that Travis will increase the deadlines soon,so these short deadlines won't last long ;) |
Send message Joined: 22 Nov 07 Posts: 285 Credit: 1,076,786,368 RAC: 0 |
Biggest problem I am seeing with the larger WU's are TIMEOUT! - One the hosts that I have left attached to the project, "they" are used to 6 min WU's so now all the WU's are aborting I accidentally detached, - my bad. - Host is re-attached now. <core_client_version>5.10.30</core_client_version> <![CDATA[ <message> Maximum CPU time exceeded </message> <stderr_txt> http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=20612 |
Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0 |
What I have been doing to address that is to go in and hand edit the FPOP estimate and limits for the 3720, 21, and 30's tasks my slow hosts picked by a factor of 60 for the 20 and 21's, and a factor of 30 for the 30's. That should take care of most of the issues, as well as not requiring a big adjustment to TDCF. That's desirable since if you pick up a timeout old task on a resend after the massive TDCF upshift as it stands, the corresponding way high runtime estimate for the old task will tend to shut down work DL's for MW until it clears the queue. This shouldn't be much of a problem, but would cause seemingly anomalous BOINC behaviour to the casual observer. Alinator |
Send message Joined: 22 Nov 07 Posts: 285 Credit: 1,076,786,368 RAC: 0 |
What I have been doing to address that is to go in and hand edit the FPOP estimate and limits for the 3720, 21, and 30's tasks my slow hosts picked by a factor of 60 for the 20 and 21's, and a factor of 30 for the 30's. HUH ? where would one go to "hand edit" the FPOP estimate and limits for what? the 3720 ?, 21 and 30's. maybe it would just be easier for the project admins to adjust the time to crunch factor on the WU's ?? |
Send message Joined: 4 Oct 07 Posts: 43 Credit: 53,898 RAC: 0 |
Adding to what Thunder said.....Nate has already said that Travis will increase the deadlines soon,so these short deadlines won't last long ;) Yeah, we'll be making some changes to the deadlines and number of WUs downloaded at a time ASAP. We'll just have to deal with it until then, sorry. ~Nate~ |
Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0 |
You have to go in and edit the relevant elements for the tasks in the <workunit> sections for the tasks in client_state. And yes, it would be easier to fix the parameters on the project side, but that doesn't help any for the work on board already. ;-) I suggest that if you only crunch MW, don't bother. However, when changes like this happen for hosts which run more than one project or have high bias resource shares, it can have unintended (and undesired) consequences for the other projects. My main point is there are manual workarounds to deal with the fallout, and mitigate the effects. <edit> I see Nathan is around. :-) Like I said above this is not a major problem, but just wanted to let people know that you can take command and work out the situation with a little thought and effort on your part, without wholesale work dumping, resets, or detaches. Alinator |
Send message Joined: 22 Nov 07 Posts: 285 Credit: 1,076,786,368 RAC: 0 |
Well it is a problem when nearly all of the tasks abort .. 7/16/2008 3:42:15 PM|Milkyway@home|Aborting task gs_3721282_1216228552_2604228_0: exceeded CPU time limit 14784.428374 7/16/2008 3:45:00 PM|Milkyway@home|Aborting task gs_3720282_1216091394_3660_1: exceeded CPU time limit 14784.428374 |
Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0 |
sounds like your boinc settings Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. |
Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0 |
What kind of host is it (or provide a link to it)? Based on what I've seen, failing on Max Time Exceeded isn't a problem for P4 class (or equivalent) hosts and newer. All I can say at this point is none of tasks I modified have failed on my slugs, and they have far more than ~14 kSecs on the currently running ones. Ones been running for over 87K and the other for 20K. So I'd have to say that your choices are limited to fix the new work so it will run to completion manually, or NNT until Nathan and Travis can figure out and set the new BOINC parameters. @Banditwolf: User preferences settings wouldn't have any effect on a Max Time Exceeded failure. Alinator |
Send message Joined: 22 Nov 07 Posts: 285 Credit: 1,076,786,368 RAC: 0 |
What kind of host is it (or provide a link to it)? Based on what I've seen, failing on Max Time Exceeded isn't a problem for P4 class (or equivalent) hosts and newer. Host was posted in a previous post, but here it is again. But this is not the only one. I have several,, all highly overclocked duo's or quads. http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=20612 I have modified the xml file per instructions since,, so I hope it corrects the problem. But I am having similar problems on several other hosts, and I just don't have to time to edit each WU's estimated time to complete on each host..... I will just have to let them crunch till they fail and loose the hours of time.... yea right! |
Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0 |
@Banditwolf: User preferences settings wouldn't have any effect on a Max Time Exceeded failure. I was thinking of the option where you can switch between projects. (I don't use it though) Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. |
Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0 |
OOP's... Sorry, I noticed that after I had posted. After looking over what I can see on the Host Summary, this is kind of strange. I would not expect this host to have Max Time problems generally, even with the current parameters. What were your runtimes like on the old work? Also there seems to be some significant variation in the claimed credit, even though they were all 3720's and 21's and ran for virtually the same amount of time. One would expect that to be about the same since they are the same type of work. Curious. This might indicate there's another underlying latent issue on the hosts showing the behaviour. One which comes to mind here is being highly overclocked, it might be that the Benchmark values are causing the problem. Remeber that Max Time is set by using the FPOP estimate divided by the floating point benchmark and is calculated by the CC itself, IIRC. You might be able to work around that by setting the floating point manually to a somewhat lower value, and then have BOINC start with the 'Don't Run Benchmarks' option set. However, that doesn't account for the claimed credit discrepancy on the same type of work. My initial thought there leans to something causing a net CPU efficiency decrease. In any event this is an example of where poofing the tasks immediately sure makes it a lot harder to glean some info into what's going on, and I agree manually 'hacking' the input parameters in client_state is not a really viable alternative on fast hosts, even for the most determined, hard core cruncher. ;-) Alinator |
Send message Joined: 22 Nov 07 Posts: 285 Credit: 1,076,786,368 RAC: 0 |
Run times on the shorter WU's was approx 4.5-5 min each if I remember. I know there is a slight issue with the timings for the memory on this host as well as a slight (85c) heat problem. But the system is set to ignore the heat until the cores melt :) It runs other projects just fine, including QMC, and only started to abort MW with the longer WU's. After a manual edit, the WU's are now claiming 70 hours to run, so we should be fine and after BOINC adjusts itself to the WU times I think this host will run fine. But I am having the same issue on my other hosts as well - so, like I said - They will just end up aborting the WU's until BOINC makes the correction to the run time factors- normally about 30-40 WU's I think. |
Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0 |
OK, now it's starting to make sense. Given your runtimes recollection, it looks like with the current project set parameters and your current time and performance metrics for the host I looked at the Max Time calculation was coming out 3 to 4 kSecs short of what the new work needs. One other thing to keep in mind though. IIRC, TDCF is not updated when tasks fail. So the other hosts won't auto correct on their own. Since yours are fast hosts which go through a lot work (relatively speaking), I would suggest just bumping the TDCF up manually and then just live with the slow down correction once the team makes the project side parameter changes. Alinator |
Send message Joined: 31 Mar 08 Posts: 1 Credit: 31,485 RAC: 0 |
About 5 times longer to complete??? Make that more like 60 times... Before, my units completed in about 11 minuts on average, prety close to the estimate. Now, the estimates are of 10:55:12, or almost 11 hours. Minutes have become hours... With a queue of 20 units, it will take some 10 days to complete, if there is absolutely nothing else running. With deaslines of about 3 days, I will have to cancell most usits, about 12-15 out of each 20. |
Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0 |
Yes, the original 5 to 10x multiplier stated was based on some very early reports when they first came out. I updated that later in the thread when it became apparent 30-60 was closer to the truth. As a side note, 30-60 seems to be holding for all my slugs except for the G3, which seems to be taking it more on the nose than the Intels or AMD's. Alinator |
Send message Joined: 30 Aug 07 Posts: 125 Credit: 207,206 RAC: 0 |
May I request that if you're doing resends of tasks that their deadline is slightly longer than 12 hours away on these long ones? ;-) One of my PCs has just been busy churning out this Milkyway task for the past 13 hours on high priority. Not a clue if it was in on time, as I wasn't here to check it and it's been purged from the DB already. Jord. The BOINC FAQ Service. |
Send message Joined: 11 Mar 08 Posts: 28 Credit: 818,194 RAC: 0 |
Over here, the old work units were taking about 7 minutes and the newer ones (gs 3720282) are taking about 7 hours. That's a 60-fold increase by my reckoning. I've had one of the shorter length work units on my Athlon - gs_3730382 series - and it finished in 3h 46m 39s, making it about 32 times longer than the old work units. My slowest machine - a Duron - is running the long work units at 12h 40m or thereabouts, which is about 57 to 58 times longer than the short ones. I've set the cache to two days on that box, which is keeping it nicely in work with a bit of leeway for the deadlines in case anything untoward happens. I've just had a quick glance back through the logs for both boxes, and there appears to be no problems with server contact whatsoever. So, it looks like everything is working nicely. Rob. |
Send message Joined: 4 Apr 08 Posts: 2 Credit: 21,246,900 RAC: 0 |
Milkyway@home was a verry cool project with short wu's. Wel 7 minutes is maybe a litle bit to short but 10 hours is a little bit to long !!! 2 or 3 hours is a better choice i think (dont forget the slower pc's) |
©2024 Astroinformatics Group