Message boards :
Number crunching :
No work
Message board moderation
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · Next
Author | Message |
---|---|
Send message Joined: 1 Dec 08 Posts: 139 Credit: 8,721,208 RAC: 0 |
|
Send message Joined: 15 Jul 08 Posts: 12 Credit: 73,431,791 RAC: 0 |
Hi All, I don't get any job since 4PM (3PM UT). What's wrong with the server ? Thanks |
Send message Joined: 6 Apr 08 Posts: 2018 Credit: 100,142,856 RAC: 0 |
|
Send message Joined: 7 Nov 08 Posts: 14 Credit: 180,768,799 RAC: 0 |
Today...I´ve read this message few times: Project communication failed: attempting access to reference site Internet access OK - project servers may be temporarily down. Milkyway@home|Scheduler request failed: Timeout was reached :( |
Send message Joined: 1 Dec 08 Posts: 139 Credit: 8,721,208 RAC: 0 |
Project communication failed: attempting access to reference site A HA! Thanks for posting this. I have been dealing with some Internet connectivity issues on some of my boxes today. What was weird was, at one point, on one box, the BOINC client would access the web sites, but IE refused. So, anyway, now I know that the "project communication..." message isn't necessarily a symptom. Thanks. |
Send message Joined: 9 Sep 08 Posts: 96 Credit: 336,443,946 RAC: 0 |
There seem to be many more 'got 0 new work' multiple times in succession again and I had one machine dry (CPU app) at work this am. Most machines manage to get something and keep the cache alive but the frequency of the 'got 0 new tasks' seems to be increasing slowly day by day for me anyway. Also, it may request 140,000 seconds of new work but only get 1 task in return- that happens a lot, so more often then not it's not running with the maximum 24 tasks on deck... |
Send message Joined: 9 Feb 09 Posts: 166 Credit: 27,520,813 RAC: 0 |
Well since a few days i am getting a steady flow of units so seems sorted for me now |
Send message Joined: 7 Nov 08 Posts: 14 Credit: 180,768,799 RAC: 0 |
My 2 core PC´s are "idling", they obtained new WU´s very sporadic (only 30-50% of their power is used, I quess). :( again and again... Milkyway@home|Scheduler request completed: got 0 new tasks |
Send message Joined: 24 Dec 07 Posts: 1947 Credit: 240,884,648 RAC: 0 |
I'm seeing on my quad core with a ATI4850 GPU installed that I only get new work once the cache has been fully emptied. I may then get anywhere from 1 to 24 tasks, then get no more work, even though it is being asked for, until the last wu has been crunched and uploaded. It appears as though a speed limit has been implemented based along the lines of: i. if comp has work and is asking for work then don't give it any. ii. if comp has no work and is asking for work then give it what is available, upto a maximum of npus * 6. If there was no work available when ii. is hit then BOINCs backoff regime can put requesting new work off for upto 24hrs. @#$%! stupid really. |
Send message Joined: 15 Jul 08 Posts: 288 Credit: 5,474,012 RAC: 0 |
Is the backoff controlled and programmed into Boinc, or does the project have some control over it? I am the Kittyman. Please visit and give a Click for Seti City. |
Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0 |
I believe the projects control it. It was quicker to get to an hour for MW, now it usually takes a few times. Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. |
Send message Joined: 15 Jul 08 Posts: 288 Credit: 5,474,012 RAC: 0 |
IF that is true, then Travis could alleviate some of the problems by limiting the maximum backoff to an hour or two at most, and then back to 1 minute intervals again to ping the servers for more work. I am the Kittyman. Please visit and give a Click for Seti City. |
Send message Joined: 12 Apr 08 Posts: 621 Credit: 161,934,067 RAC: 0 |
The problem is that if the server is barely hanging in there with the number of requests, these limits and changes will increase the loads. That is the whole point to the exponential back-off so that as the client gets unserved it waits a bit longer each time. I am not sure that we have completely solved the work-flow problem as in we are pulling work off the server faster than the system can make new tasks and put it into the server queues ... It will only get worse as more people adopt the ATI GPU application ... |
Send message Joined: 12 Apr 08 Posts: 621 Credit: 161,934,067 RAC: 0 |
The problem is that if the server is barely hanging in there with the number of requests, these limits and changes will increase the loads. That is the whole point to the exponential back-off so that as the client gets unserved it waits a bit longer each time. I am not sure that we have completely solved the work-flow problem as in we are pulling work off the server faster than the system can make new tasks and put it into the server queues ... It will only get worse as more people adopt the ATI GPU application ... |
Send message Joined: 13 Feb 08 Posts: 1124 Credit: 46,740 RAC: 0 |
It will only get worse as more people adopt the ATI GPU application ... Its making my eyesight worse, I am seeing double now. |
Send message Joined: 27 Aug 07 Posts: 915 Credit: 1,503,319 RAC: 0 |
It will only get worse as more people adopt the ATI GPU application ... D'OH! me@rescam.org |
Send message Joined: 6 Apr 08 Posts: 2018 Credit: 100,142,856 RAC: 0 |
|
Send message Joined: 15 Jul 08 Posts: 288 Credit: 5,474,012 RAC: 0 |
The problem is that if the server is barely hanging in there with the number of requests, these limits and changes will increase the loads. That is the whole point to the exponential back-off so that as the client gets unserved it waits a bit longer each time. So the question remains.....where actually, is the bottleneck? It would not appear to be the splitter....as the server status page always shows 100's of WUs ready to send (at least when I have checked). And it would not appear to be the available bandwidth to the server or the router, as I have not seen any 'unable to connect to project' or http errors such as has been the problem at Seti when their bandwidth gets saturated. Just the no work from project message. If I recall correctly, Seti seemed to have resolved the problem to communications between the master database, the feeder, and the scheduler. The work is split and available, but the feeder cannot send it to the scheduler fast enough....or a similar scenario. So how to fix that? As opposed to Seti, it would appear that here all processes for the project are on a single computer, Milkyway. Is it possible to give some processes a higher priority to let them do their work? A second feeder process? A larger feeder cache? I don't know such things. I think that Travis had been consulting with Dr. A. about the problem, and had set some 'counters' to try to analyze it, but I don't think I have seen that he got any hard answers from the exercise. I am the Kittyman. Please visit and give a Click for Seti City. |
Send message Joined: 9 Sep 08 Posts: 96 Credit: 336,443,946 RAC: 0 |
Another quad machine (XP Home) dry this am (CPU app) :( Out of 8 manual primes for new work only one successfully returned new work and then it was only 6 WU's- the other 7 attempts > ' got 0 new tasks'... it had not 'called home' for 1 hour and was not scheduled for another attempt for another hour- so 2 hours sitting with completed WU's and no further attempts to get more work. With further manual primes it is now back to full allotment of 24 WU's... it seems that it is a bit of bad luck timing to run dry but obviously there are many, many times when the server is not doling out new work for whatever reason. |
Send message Joined: 28 Apr 08 Posts: 1415 Credit: 2,716,428 RAC: 0 |
Guess I've been lucky, I haven't run out of WU's in weeks. I'm only running 2 machines though( a single core N' a double core) Now knocking on wood :-) |
©2024 Astroinformatics Group