Welcome to MilkyWay@home

Won't finish in time


Advanced search

Message boards : Number crunching : Won't finish in time
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Beau

Send message
Joined: 3 Jan 09
Posts: 270
Credit: 124,346
RAC: 0
100 thousand credit badge10 year member badge
Message 27224 - Posted: 7 Jul 2009, 10:59:32 UTC - in response to Message 27198.  

I have run the optimized client in the past, but am not running it currently because of confusion about what would be allowed, and if results returned by an optimized client would be awarded credits or scrapped depending on what version you might be using. In any case; I dont think that someone should have to run an optimized client just so work will finish in time. If the standard client is not able to do the work within the assigned amount of time, then some serious changes need to be made to the standard client, the project itself, or both.
ID: 27224 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileThe Gas Giant
Avatar

Send message
Joined: 24 Dec 07
Posts: 1947
Credit: 240,884,648
RAC: 0
200 million credit badge10 year member badge
Message 27229 - Posted: 7 Jul 2009, 11:26:08 UTC - in response to Message 27224.  

I have run the optimized client in the past, but am not running it currently because of confusion about what would be allowed, and if results returned by an optimized client would be awarded credits or scrapped depending on what version you might be using. In any case; I dont think that someone should have to run an optimized client just so work will finish in time. If the standard client is not able to do the work within the assigned amount of time, then some serious changes need to be made to the standard client, the project itself, or both.

There is no confusion over what is allowed. Just go to the zslip site.

BOINC will only 'throw out' wu's when they haven't been started. If they haven't been started then it is no skin off anyones nose if the system is smart enough to not perform 'useless' calaculations. Recently I have not seen any reports of wu's being cancelled once they have been started.
ID: 27229 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Beau

Send message
Joined: 3 Jan 09
Posts: 270
Credit: 124,346
RAC: 0
100 thousand credit badge10 year member badge
Message 27295 - Posted: 8 Jul 2009, 11:38:38 UTC - in response to Message 27229.  

At the time I was using it; there was a great deal of confusion as to what versions would be allowed, and which versions would be granted credit, even if results were returned as "valid", maybe that has died down some, I havent looked into it because I was also tired of having to babysit it so much. I have had a bunch of workunits within the past week that were canceled/disallowed because they "would not finish in time", yet they were sent on to someone else. I do not know who is making the decisions here at MW, I am sure it is someone "behind the curtain", but these changes are not very good, that is just my opinion, I am sure some will disagree.
ID: 27295 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
John R. @ SETI.USA

Send message
Joined: 1 Jan 09
Posts: 15
Credit: 84,016,178
RAC: 0
50 million credit badge10 year member badge
Message 27314 - Posted: 8 Jul 2009, 16:35:49 UTC

There have been many well thought out answers posted to this thread.

I stand by my first post.

I am running 6 boxes here at home with SETI and MW 50/50.

2 have an ATI card, 4 don't.

I never saw that message about % run time and not finishing in time until I upgraded those 6 boxes to 6.6.36. I saw it on the Video crunchers and on my 3 quads.

I downgraded to 6.6.20 and haven't seen that message since.

So this old Redneck is assuming it had something to do with version 6.6.36.

Actually, the only reason I went to a 6 version was to try CUDA. Only have 8800GTs and abandoned running with CUDA.

5.10.45 was my version of choice and I'm thinking of going back to it.
ID: 27314 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileThe Gas Giant
Avatar

Send message
Joined: 24 Dec 07
Posts: 1947
Credit: 240,884,648
RAC: 0
200 million credit badge10 year member badge
Message 27322 - Posted: 8 Jul 2009, 19:08:58 UTC

I have seen it nearly every day with the new versions of BOINC where it will download too much work for a project other than MW (which can't download too much work thanks to the 6 wu limit per core and the optimised app). In this scenario and due to the shorter MW deadlines (compared to other projects) BOINC will then think it will not be able to complete MW work before the MW deadline until it completes a few wu's of the second/third/forth/etc projects. I found suspending some of the other wu's (before they were started) overcame this issue as well as going back to an older BOINC rev.
ID: 27322 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Nightfire73

Send message
Joined: 4 Apr 08
Posts: 10
Credit: 32,029,317
RAC: 1,688
30 million credit badge10 year member badge
Message 27334 - Posted: 8 Jul 2009, 21:55:10 UTC - in response to Message 27182.  

Well ok, I run too many projects, but running only 2 doesn't solve the problem. Going back to an older version of boinc does.
Maybe I really don't understand the scheduling process. I thought, as you all confirm in your answers, that the scheduler decides how many tasks to get.
So if my comp is overworked, the scheduler should not request new tasks. Right?
But he does, and the server answers "won't finish in time".
How does the server know that new tasks won't finish in time?
Does the scheduler tell the server?
Let's assume the scheduler does tell the server that the tasks won't finish in time, so why does he ask for new ones if he already knows they won't finish in time?
Everything worked fine with 6.6.20. Now with 6.6.36 I am suddenly overworked.
U say the longer running times for the MW tasks pushed my machine over the edge. Does going back to 6.6.20 shorten the running times? Guess not, but everything works fine again.
I may also add that I used to run 12 tasks on an Athlon XP 2000 and never got overdue tasks, maybe because the scheduler did what it is supposed to do.
ID: 27334 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileThe Gas Giant
Avatar

Send message
Joined: 24 Dec 07
Posts: 1947
Credit: 240,884,648
RAC: 0
200 million credit badge10 year member badge
Message 27347 - Posted: 8 Jul 2009, 23:07:30 UTC

BOINC 6.6.36 -> the most FUBAR'd release since the last one......
ID: 27347 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bill

Send message
Joined: 3 Oct 07
Posts: 21
Credit: 49,862
RAC: 0
10 thousand credit badge10 year member badge
Message 27456 - Posted: 10 Jul 2009, 13:39:14 UTC - in response to Message 27334.  

So if my comp is overworked, the scheduler should not request new tasks. Right?
But he does, and the server answers "won't finish in time".
How does the server know that new tasks won't finish in time?
Does the scheduler tell the server?


Boinc wants work to fill its cache and tells the server I want this much and this is what I currently have (a list of task duration and deadlines)

The server then sees if it has a task that can be put into the list without violating any deadlines.

For a long deadline project like Einstein the calculation simplifies to: sum list durations + new task duration + now < new task deadline.

For short deadline like Milky it becomes: if I put the task at the top of the list does that put any other tasks in danger of missing their deadline.

The server has to make the decision since Boinc doesn't know what task duration/task deadlines the server has.
ID: 27456 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Nightfire73

Send message
Joined: 4 Apr 08
Posts: 10
Credit: 32,029,317
RAC: 1,688
30 million credit badge10 year member badge
Message 27461 - Posted: 10 Jul 2009, 17:56:01 UTC - in response to Message 27456.  

Thank you for your time and the explanations Bill.
Then I guess there must be something wrong in what the scheduler tells the server.
I checked again.... please tell me if I am wrong: I have 10 projects that run on 2 cores, which means I have 2x24 hours processing time a day, say 4.8 hours processing time per project per day (+ the cuda with 2 projects). So even with half that time I should be able to complete all the tasks way before the deadline. My cache is set to 0.5 days, so I usually have only 1 or 2 tasks per project waiting or running on my machine, and each projects gets equal resource share. I regularly get new work from all the projects, only MW sometimes comes up with the "won't finish in time" message. And this only with boinc 6.6.36. As an example, PrimeGrid has similar deadlines and similar running times, but I don't get strange messages from that server. Now my question is... did the earlier versions of boinc not understand that my comp is overworked, or (as I guess) is there something wrong with 6.6.36 that makes the MW server think my machine is overworked? Maybe the scheduler adds the cuda processing times to the equation, making the server believe there is not enough time for anoter MW task?

ID: 27461 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Odd-Rod

Send message
Joined: 7 Sep 07
Posts: 442
Credit: 1,429,632
RAC: 85
1 million credit badge10 year member badge
Message 27578 - Posted: 12 Jul 2009, 11:42:17 UTC
Last modified: 12 Jul 2009, 11:44:47 UTC

This is close to cross-posting (see thread "boinc 6.6.36 version") and I apologize, but I've tried to only post according to the thread topics. (And I didn't Copy & Paste!)

I have also experienced the problems mentioned in this thread on the only host I had "up"graded to 6.6.36. I have gone back to 6.6.28 on it and it now gets MW WUs - and finishes them in time. I should also mention that I'm talking only of CPU here. [Edit] The downgrade was the only change I made on that host [/Edit]

The problem was seen only on the 6.6.36 host, and I must add, only at milkyway. So, while it seems to be caused by 6.6.36, there is something at MW that contributes to the problem. I'll leave it to Boinc and MW to 'fight' it out.

I'm just happy that I can crunch here with that host again.

Rod
ID: 27578 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Nightfire73

Send message
Joined: 4 Apr 08
Posts: 10
Credit: 32,029,317
RAC: 1,688
30 million credit badge10 year member badge
Message 27711 - Posted: 13 Jul 2009, 22:31:04 UTC
Last modified: 13 Jul 2009, 22:31:24 UTC

Hey folks... just installed 6.6.37. Now let's see what this one does...
;-))
ID: 27711 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Nightfire73

Send message
Joined: 4 Apr 08
Posts: 10
Credit: 32,029,317
RAC: 1,688
30 million credit badge10 year member badge
Message 27713 - Posted: 13 Jul 2009, 23:48:12 UTC

Yeah, right... call me when 6.7. is ready.
ID: 27713 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
John Lewis Highsmith

Send message
Joined: 15 Dec 08
Posts: 2
Credit: 9,589,248
RAC: 1,688
5 million credit badge10 year member badgeextraordinary contributions badge
Message 27726 - Posted: 14 Jul 2009, 5:43:04 UTC
Last modified: 14 Jul 2009, 6:10:34 UTC

I was running 6.6.36 and getting the "won't finish . . ." message. I dropped back to 6.6.26 and in short order received 4 MW. When they were almost completed I switched back to 6.6.36. The MW completed and the next time a request for MW was made, the "won't finish . . ." reappeared.

Shortly after submitting the above I decided to return to 6.6.26. 5 MW popped up.
ID: 27726 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 21 Aug 08
Posts: 625
Credit: 558,425
RAC: 0
500 thousand credit badge10 year member badge
Message 27728 - Posted: 14 Jul 2009, 6:05:05 UTC - in response to Message 27726.  
Last modified: 14 Jul 2009, 6:25:35 UTC

I was running 6.6.36 and getting the "won't finish . . ." message. I dropped back to 6.6.26 and in short order received 4 MW. When they were almost completed I switched back to 6.6.36. The MW completed and the next time a request for MW was made, the "won't finish . . ." reappeared.


You have tasks over at Cosmology that take an average of 23.51 hours for each task based on the average of the two tasks that have times there right now. You are also attached to Spinhenge, World Community Grid, Rosetta, RALPH, and SETI, all of which had some credits within the past week (WCG, RALPH, and Spinhenge, along with Milkyway in the past 24 hours). So, when figuring out what needs to happen in the next 3 days, resource share must be taken into account, as well as how much work needs to be done for each project you're attached to that has work sitting on your machine so that all of the work can complete within deadline.

I still maintain that people with overworked systems would be seeing these messages REGARDLESS of the BOINC version. While I don't dispute that 6.6.36 appears to have issues based on what a number of people are saying, it is not to blame for any and all issues in regards to people being told that their systems won't get work finished by deadlines.

IMO, YMMV, etc, etc, etc....
ID: 27728 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
John Lewis Highsmith

Send message
Joined: 15 Dec 08
Posts: 2
Credit: 9,589,248
RAC: 1,688
5 million credit badge10 year member badgeextraordinary contributions badge
Message 27731 - Posted: 14 Jul 2009, 7:18:30 UTC - in response to Message 27728.  
Last modified: 14 Jul 2009, 7:23:58 UTC

Admittedly there is a heavy load on my plate. But, there are a couple of mitigators.

The most important is that the Cosmology WUs are not due until 7/27. One has already had work (19:06) done on it and has but 2:37 to go, a second started as a 17+, had 2:45 done with 11:51 left. The third has not started and has 27:03 to completion. Most of the WUs don't take the initial indicated TTC. It is off, on my machine, by about 14%.

The second mitigant is a dual processor, which works well enough that I have had no work finish late. My computer runs 24/7, and most of that is BOINC time.

So with approximately 240 hours of work, and with it spread out the way it is, there should still be no work left behind.

I'll post how things come out o/a 8/3.
ID: 27731 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Nightfire73

Send message
Joined: 4 Apr 08
Posts: 10
Credit: 32,029,317
RAC: 1,688
30 million credit badge10 year member badge
Message 27772 - Posted: 15 Jul 2009, 3:22:42 UTC - in response to Message 27728.  

I still maintain that people with overworked systems would be seeing these messages REGARDLESS of the BOINC version. While I don't dispute that 6.6.36 appears to have issues based on what a number of people are saying, it is not to blame for any and all issues in regards to people being told that their systems won't get work finished by deadlines


Brian, I agree with you. If our computers are overworked, we should get such messages (or no work) from all the projects and regardless of the boinc version.
And I can assure you, my computer is not overworked: The tasks all finish way before the deadline, and I do not get to see (ever) tasks running in high priority mode, contrary to what I see on other computers that may be turned off for some days and then go haywire when I turn them back on again. These are overworked indeed.
Now, if we want to see that there is an issue here and maybe try to find elements that help solve it, then I am more than happy to describe to the world what happens on my machines. But if the game is to blame all the overworked machines that suddenly turned up with the release of boinc 6.6.36, then I may go back to some previous version and crunch on happily as I did before with my overworked machine that does get all the jobs done in time, including MW.
I feel like a few years back, when I went to buy new tyres for my car. The guy in the tyre shop went on telling me that the size I was asking for did not exist and that it never was produced by any tyre manufacturer. I took him to the car, had him read the inscription on the side of my non-existent tyres, and went to buy the non-existent things elsewhere.
ID: 27772 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 21 Aug 08
Posts: 625
Credit: 558,425
RAC: 0
500 thousand credit badge10 year member badge
Message 27804 - Posted: 15 Jul 2009, 15:55:53 UTC - in response to Message 27772.  
Last modified: 15 Jul 2009, 16:22:19 UTC

I still maintain that people with overworked systems would be seeing these messages REGARDLESS of the BOINC version. While I don't dispute that 6.6.36 appears to have issues based on what a number of people are saying, it is not to blame for any and all issues in regards to people being told that their systems won't get work finished by deadlines


Brian, I agree with you. If our computers are overworked, we should get such messages (or no work) from all the projects and regardless of the boinc version.


I read the later part of your post that describes wanting to have two-way communication. That's fine, but you also need to be receptive to the same two-way communication. Other projects may not give you these messages because they have extremely lengthy deadlines. In the case of SETI, some deadlines can be 2-3 months out. One thing that several people worked on over there, notably Richard Hasselgrove and Joe (??? on the last name for him, Segur maybe???), was to convince SETI that their deadlines were too long, which enabled people to build up huge stockpiles of work that would end up causing these types of issues with other projects as well as a higher abandon rate there at SETI. Here though, the deadlines are in 3 days. The shorter the amount of time you have to work with, the more your system has to be dedicated towards that project.

Additionally, many participants would yell and scream about "Earliest Deadline First" / "High Priority", so efforts were made to suppress alerting people to those types of things. Even my old 5.8.16 version does not give me a message about EDF for these tasks here, although from looking at the behavior of what it does, it clearly is in some form of EDF, most likely Connect Interval (cache setting) based scheduling. Example: During the night I had a few Milkyway tasks and 2 Cosmology tasks on my system, neither of which were suspended, but neither had been started. The first Cosmology task was not started until the last Milkyway task had been finished. My settings are for a 3-day cache. Deadlines are in 3 days. It makes the effort to complete the work that is due the soonest first, even though for 6 tasks from here I need a maximum of 12 hours of CPU time across 3 days and my average turnaround time is 0.24 days.

One other thing is I'm wondering if version changes mess with the local duration correction factor similar to a reset project.

As I have said, I do not dispute that 6.6.36 seems to have problems. However, task runtimes here are doubled compared to a few months ago, lengthy deadline projects will behave differently than tight deadline projects, and your changing versions back and forth may be misleading you. If you want to ignore those variables in trying to solve your issue, that's your choice. I'm just reminding you that they're there...
ID: 27804 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Odd-Rod

Send message
Joined: 7 Sep 07
Posts: 442
Credit: 1,429,632
RAC: 85
1 million credit badge10 year member badge
Message 27820 - Posted: 15 Jul 2009, 22:16:17 UTC - in response to Message 27804.  

As I have said, I do not dispute that 6.6.36 seems to have problems. However, task runtimes here are doubled compared to a few months ago, lengthy deadline projects will behave differently than tight deadline projects, and your changing versions back and forth may be misleading you. If you want to ignore those variables in trying to solve your issue, that's your choice. I'm just reminding you that they're there...


I hope this doesn't come across badly - I really am saying this respectfully to your comment on the number of variables playing a part here.

For me the issue I (and others) have/had is: not getting work and receiving a message from Milkyway that it won't send work because it 'Won't finish in time', when using 6.6.36. When using 6.6.28 I get work from Milkyway and other projects and they all do 'finish in time'. So Boinc and/or Milkyway do need to consider the real cause of the problem, but I don't. My issue is solved by dropping back from 6.6.36.

Hopefully the next Boinc version will solve the issue properly. Until then, no 6.6.36 for me.

Rod
ID: 27820 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 21 Aug 08
Posts: 625
Credit: 558,425
RAC: 0
500 thousand credit badge10 year member badge
Message 27828 - Posted: 16 Jul 2009, 3:44:53 UTC - in response to Message 27820.  

As I have said, I do not dispute that 6.6.36 seems to have problems. However, task runtimes here are doubled compared to a few months ago, lengthy deadline projects will behave differently than tight deadline projects, and your changing versions back and forth may be misleading you. If you want to ignore those variables in trying to solve your issue, that's your choice. I'm just reminding you that they're there...


I hope this doesn't come across badly - I really am saying this respectfully to your comment on the number of variables playing a part here.

For me the issue I (and others) have/had is: not getting work and receiving a message from Milkyway that it won't send work because it 'Won't finish in time', when using 6.6.36. When using 6.6.28 I get work from Milkyway and other projects and they all do 'finish in time'. So Boinc and/or Milkyway do need to consider the real cause of the problem, but I don't. My issue is solved by dropping back from 6.6.36.

Hopefully the next Boinc version will solve the issue properly. Until then, no 6.6.36 for me.

Rod


Take a look at client_state.xml. Find the Milkyway section, and then notate your values for:

<short_term_debt>0.000000</short_term_debt>
<long_term_debt>-488763.127288</long_term_debt>
<resource_share>100.000000</resource_share>
<duration_correction_factor>1.293284</duration_correction_factor>

My current values are in what I just pasted... My long_term_debt is large because I'm only really participating here now...Cosmology sporadically...

Next install a different version, and make sure none of those values change. If they do not, then there probably needs to be a "sanity check" to make sure that you really should not be getting those messages. There is a possibility that 6.6.36 is actually working correctly and that the other versions were not giving you the correct info. Remember that the message coming back to you is a guesstimate, so there could be tolerance percentages involved in the determination. It also considers system uptime, so if you have had your system powered off more recently, that will impact the determination. It is also interesting to note that most (nearly all?) people experiencing this are attached to multiple projects and appear to be running a heavy workload, and this project's work did recently increase in runtime, so all of those factors should be weighed against there being "a problem", regardless of how "obvious" it might seem that 6.6.36 is "broken".

John McLeod handles the scheduling, I believe. You might also check with Richard Hasselgrove to see if perhaps he's tested and found something odd with 6.6.36. Since you don't have Vista or 7, you honestly don't need any version newer than 5.10.45 unless one of the other projects you're participating in requires a higher minimum version.
ID: 27828 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Raistmer*

Send message
Joined: 27 Jun 09
Posts: 85
Credit: 18,592,228
RAC: 0
10 million credit badge10 year member badge
Message 28115 - Posted: 21 Jul 2009, 16:18:19 UTC
Last modified: 21 Jul 2009, 16:22:01 UTC

Only 3 projects active, only MW is able to request work, MW has biggest project share, ATI GPU opt app installed, HD4870 so task completed in~3 mins.
4 days work cache, was not full at that moment.
But after some hours of host outage MW started to send messages about BOINC running only 80% of time and refuse to give work.
BOINC 6.6.36.
IMO it's problem with BOINC 6.6.36 specifically. All MW debts was zero (LTD) or positive (STD).

ADDON: MW gives work even in such situation but only when all prev work uploaded and reported. This feature intruduces delays and makes GPU idle. Surely there were no deadline miss before and with only 24 (max possible value for my host, it's quad) tasks cache and 3 min per task never could be.
ID: 28115 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : Won't finish in time

©2020 Astroinformatics Group