No work

Author	Message
Seejay Send message Joined: 22 Dec 07 Posts: 51 Credit: 2,405,016 RAC: 0	Message 14711 - Posted: 10 Mar 2009, 11:31:12 UTC Here we go again: Came in from work... both CPU boxes idle - I don't have an ATI card ;^). Everything's been running smoothly since the end of Feb., so what's changed? IMO, as previously stated, the feeder can't keep the scheduler queue sufficiently fed!! The number of GPUs is increasing exponentially, so the scheduler queue must be enlarged. Not much good for the crunchers, or the project, if we're all waiting for work that's there, but cannot be had! Seejay Proud Member and Founder of BOINC Team Allprojectstats.com ID: 14711 · Rating: 0 · rate: / Reply Quote

banditwolf Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0	Message 14720 - Posted: 10 Mar 2009, 13:26:12 UTC Possibly when the stats page reads 648 to send, 800 are being requested. Which don't allow some to recieve work. Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. ID: 14720 · Rating: 0 · rate: / Reply Quote

JAMC Send message Joined: 9 Sep 08 Posts: 96 Credit: 336,443,946 RAC: 0	Message 14723 - Posted: 10 Mar 2009, 14:02:21 UTC And another quad just went dry- that's two in 4-5 hours :( ID: 14723 · Rating: 0 · rate: / Reply Quote

msattler Send message Joined: 15 Jul 08 Posts: 288 Credit: 5,474,012 RAC: 0	Message 14724 - Posted: 10 Mar 2009, 14:05:57 UTC - in response to Message 14720. Possibly when the stats page reads 648 to send, 800 are being requested. Which don't allow some to recieve work. That is certainly a possibility....Travis had mentioned increasing the size of the ready to send cache. His last post was 3 days ago that he was down with the flu. Hopefully he is feeling better and can respond to this thread soon with some new info. I am the Kittyman. Please visit and give a Click for Seti City. ID: 14724 · Rating: 0 · rate: / Reply Quote

Chris S Send message Joined: 20 Sep 08 Posts: 1391 Credit: 203,563,566 RAC: 0	Message 14726 - Posted: 10 Mar 2009, 14:17:08 UTC Hiyah Mark! Nice to see you back. Apparently Travis hasn't been too well, which does account for a lack of response. They say patience is a virtue, but virtues can be fast-tracked sometimes :-) Don't drink water, that's the stuff that rusts pipes ID: 14726 · Rating: 0 · rate: / Reply Quote

banditwolf Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0	Message 14731 - Posted: 10 Mar 2009, 14:45:26 UTC - in response to Message 14724. Possibly when the stats page reads 648 to send, 800 are being requested. Which don't allow some to recieve work. That is certainly a possibility....Travis had mentioned increasing the size of the ready to send cache. His last post was 3 days ago that he was down with the flu. Hopefully he is feeling better and can respond to this thread soon with some new info. Maybe the 1 server is already maxed out? Either giving priority to that function or increading the cache (2-3x) would work. Given that either can be done. Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. ID: 14731 · Rating: 0 · rate: / Reply Quote

msattler Send message Joined: 15 Jul 08 Posts: 288 Credit: 5,474,012 RAC: 0	Message 14760 - Posted: 10 Mar 2009, 18:00:00 UTC - in response to Message 14731. Possibly when the stats page reads 648 to send, 800 are being requested. Which don't allow some to recieve work. That is certainly a possibility....Travis had mentioned increasing the size of the ready to send cache. His last post was 3 days ago that he was down with the flu. Hopefully he is feeling better and can respond to this thread soon with some new info. Maybe the 1 server is already maxed out? Either giving priority to that function or increading the cache (2-3x) would work. Given that either can be done. There might be other possibilities, depending on how close the Milkyway server is to it's processing capacity..... Seti runs separate upload and download servers....on separate computers. Whether this could be done as separate processes on the same computer I am not sure. Seti runs multiple splitters on the same computer. Maybe this could be done here if it is determined that splitting capacity is part of the problem. Of course, if the Milkyway server is pretty much to the limits of it's processing power, then the only answer is to add another server to share the load and split some of the processes between the two, if Travis wants his project to continue to grow. I am the Kittyman. Please visit and give a Click for Seti City. ID: 14760 · Rating: 0 · rate: / Reply Quote

Lloyd M. Send message Joined: 1 Dec 08 Posts: 139 Credit: 8,721,208 RAC: 0	Message 14822 - Posted: 10 Mar 2009, 22:23:06 UTC - in response to Message 14760. Of course, if the Milkyway server is pretty much to the limits of it's processing power, then the only answer is to add another server to share the load and split some of the processes between the two, if Travis wants his project to continue to grow. Hard telling. I think it's safe to say that SETI has at least a couple of orders of magnitude larger volunteer base than this project. So I don't know if it's a fair comparison. It might be better to find one or more projects with similarly-sized WUs and user base, and obtain information on how they are provisioned and configured on the server end. I think Travis is continuing in his efforts in finding the bottleneck, and it doesn't appear that he has yet. Clearly, there are a LOT of moving parts here, and (given the symptoms), this could be something as simple as a single router that's not up to the task, or is experiencing intermittent problems. And (especially given the issues SETI is having), it's not clear to me how easy it actually is to scale BOINC server apps. Most of my boxes have had a much smaller queue of WUs for this project than usual. This is especially telling, given that I've stepped up processing other projects by a considerable amount. I may have even run our of WUs for this project from time to time. Interestingly enough, the one box I haven't seen subject to this phenomenon is a dual 2.5 gHz P4 Xeon Ubuntu server. Now, I'm not saying that it never has been - I haven't been watching that closely and it's strictly CLI so it's harder to track. And before I get chastised again for some kind of BOINC malpractice (or ignorance) - BOINCview doesn't want to work with the version of the client that this box has, nor does the version of the client on my main box want to talk to the (much newer) version of the client this box has. And I have other things to do with my time right now than to update the BOINC client in my main box. I'm hoping that the posts I made a while back, when I was still getting WUs no problem when many others were not, then when I did have some problems, might help Travis with troubleshooting. A final thought - I just checked, and it appears that the results aren't on "insta-purge", so that should be a great help in tracking down the WU supply problem. ID: 14822 · Rating: 0 · rate: / Reply Quote

Brian Silvers Send message Joined: 21 Aug 08 Posts: 625 Credit: 558,425 RAC: 0	Message 14877 - Posted: 11 Mar 2009, 4:25:39 UTC - in response to Message 14822. And (especially given the issues SETI is having), it's not clear to me how easy it actually is to scale BOINC server apps. Their problem has been and continues to be the lack of money. IMO, they should turn off new account creation and focus on getting their server farm stable, then start upgrading equipment with whatever monies they can come up with, still leaving new account creation turned off. Once it appears that there is some breathing room, new account creation could be turned back on... ID: 14877 · Rating: 0 · rate: / Reply Quote

Alinator Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0	Message 14878 - Posted: 11 Mar 2009, 5:46:08 UTC We've been over this already guys. They know what the problem is and SAH has run up against it too. The scheduler's task queue size is limited by how big a shared memory chunk you can afford to allocate to it. Apparently though, MW shares its server resources with other users, and Travis said previously that he was going to have to interface with other lab staff to see if they can up the ante, so to speak, to help alleviate the problem on a more permanent basis. Of course, his coming down with the 'plague' recently hasn't helped matters in this regard. ;-) Alinator ID: 14878 · Rating: 0 · rate: / Reply Quote

msattler Send message Joined: 15 Jul 08 Posts: 288 Credit: 5,474,012 RAC: 0	Message 14879 - Posted: 11 Mar 2009, 5:55:12 UTC - in response to Message 14822. Last modified: 11 Mar 2009, 5:55:48 UTC Of course, if the Milkyway server is pretty much to the limits of it's processing power, then the only answer is to add another server to share the load and split some of the processes between the two, if Travis wants his project to continue to grow. Hard telling. I think it's safe to say that SETI has at least a couple of orders of magnitude larger volunteer base than this project. So I don't know if it's a fair comparison. It might be better to find one or more projects with similarly-sized WUs and user base, and obtain information on how they are provisioned and configured on the server end. I think Travis is continuing in his efforts in finding the bottleneck, and it doesn't appear that he has yet. Clearly, there are a LOT of moving parts here, and (given the symptoms), this could be something as simple as a single router that's not up to the task, or is experiencing intermittent problems. And (especially given the issues SETI is having), it's not clear to me how easy it actually is to scale BOINC server apps. I was not trying to draw a direct comparison between Milkyway and Seti..... Of course Seti is magnitudes larger than Milkyway. I was simply using Seti as an example of some things that can be done within the Boinc framework to handle a larger user base. Seti's troubles are their own, and I do not wich to have discussion of their problems here, but if they can scale things to even get close to properly handling their user base (which they do, much of the time), then it certainly is possible to scale Boinc to handle the user base of Milkyway.... I have no idea how heavily loaded the Milkyway server is, what Travis has tried or considered, or what his budget may or may not be. I am sure that Travis is still looking into the matter, and hope he can give us some new insights into what he has found soon. I wish him all success in his troubleshooting. I am the Kittyman. Please visit and give a Click for Seti City. ID: 14879 · Rating: 0 · rate: / Reply Quote

The Gas Giant Send message Joined: 24 Dec 07 Posts: 1947 Credit: 240,884,648 RAC: 0	Message 14880 - Posted: 11 Mar 2009, 6:18:13 UTC - in response to Message 14879. Of course, if the Milkyway server is pretty much to the limits of it's processing power, then the only answer is to add another server to share the load and split some of the processes between the two, if Travis wants his project to continue to grow. Hard telling. I think it's safe to say that SETI has at least a couple of orders of magnitude larger volunteer base than this project. So I don't know if it's a fair comparison. It might be better to find one or more projects with similarly-sized WUs and user base, and obtain information on how they are provisioned and configured on the server end. I think Travis is continuing in his efforts in finding the bottleneck, and it doesn't appear that he has yet. Clearly, there are a LOT of moving parts here, and (given the symptoms), this could be something as simple as a single router that's not up to the task, or is experiencing intermittent problems. And (especially given the issues SETI is having), it's not clear to me how easy it actually is to scale BOINC server apps. I was not trying to draw a direct comparison between Milkyway and Seti..... Of course Seti is magnitudes larger than Milkyway. I was simply using Seti as an example of some things that can be done within the Boinc framework to handle a larger user base. Seti's troubles are their own, and I do not wich to have discussion of their problems here, but if they can scale things to even get close to properly handling their user base (which they do, much of the time), then it certainly is possible to scale Boinc to handle the user base of Milkyway.... I have no idea how heavily loaded the Milkyway server is, what Travis has tried or considered, or what his budget may or may not be. I am sure that Travis is still looking into the matter, and hope he can give us some new insights into what he has found soon. I wish him all success in his troubleshooting. Well said Mark. ID: 14880 · Rating: 0 · rate: / Reply Quote

Lloyd M. Send message Joined: 1 Dec 08 Posts: 139 Credit: 8,721,208 RAC: 0	Message 14951 - Posted: 11 Mar 2009, 22:27:47 UTC - in response to Message 14879. Last modified: 11 Mar 2009, 22:30:20 UTC msattler wrote: I have no idea how heavily loaded the Milkyway server is, what Travis has tried or considered, or what his budget may or may not be. I agree with "The Gas Giant" - well said. Your points are well taken. I'm just trying to be helpful here, (not argumentative), by making distinctions and clarifiying things. "Allinator" reports that MW is using part of one server. Yikes! Surely something can be done about this. I've personally had good luck getting perfectly adequate servers from the surplus market. One was an HP-Compaq, dual P4 Xeon 2.5 gHz with HT, 1GB memory and 2 X 36GB HDD (four bays open), and all the hot-swap fans, that I got for $75, which was the minimum ebay bid. This was partially because I could save on shipping because I could do local pickup, but I've seen similar machines (albeit with no drives) for similar prices with shipping included. On my server, I was able to up the memory to 2.5GB for about $40, and I was able to use the 512MB I took out in order to do this in another server. I bought another one for a little more, that had three 36GB and 2 18GB drives. I'm going to run the smaller ones RAID1 on that server, and have moved the three bigger drives to the "main" server, RAID5 I've found the kicker to be the hard drives. They usually have a lot of hours on them by the time they hit the surplus market, and I believe that this isn't acceptable for mission-critical applications. In other cases, there are no drives or trays included at all (trays can usually be found on ebay for cheap). I keep thinking that there has to be some source for good, hot-swap server drives at reasonable prices (perhaps some kind of overstock in smaller capacities). If you have enough bays (at least a 2U server, or even a 4U server), each drive being on the small side by contemporary standards shouldn't matter. ID: 14951 · Rating: 0 · rate: / Reply Quote

msattler Send message Joined: 15 Jul 08 Posts: 288 Credit: 5,474,012 RAC: 0	Message 14996 - Posted: 12 Mar 2009, 6:11:13 UTC - in response to Message 14951. msattler wrote: I have no idea how heavily loaded the Milkyway server is, what Travis has tried or considered, or what his budget may or may not be. I agree with "The Gas Giant" - well said. Your points are well taken. I'm just trying to be helpful here, (not argumentative), by making distinctions and clarifiying things. "Allinator" reports that MW is using part of one server. Yikes! Surely something can be done about this. I've personally had good luck getting perfectly adequate servers from the surplus market. One was an HP-Compaq, dual P4 Xeon 2.5 gHz with HT, 1GB memory and 2 X 36GB HDD (four bays open), and all the hot-swap fans, that I got for $75, which was the minimum ebay bid. This was partially because I could save on shipping because I could do local pickup, but I've seen similar machines (albeit with no drives) for similar prices with shipping included. On my server, I was able to up the memory to 2.5GB for about $40, and I was able to use the 512MB I took out in order to do this in another server. I bought another one for a little more, that had three 36GB and 2 18GB drives. I'm going to run the smaller ones RAID1 on that server, and have moved the three bigger drives to the "main" server, RAID5 I've found the kicker to be the hard drives. They usually have a lot of hours on them by the time they hit the surplus market, and I believe that this isn't acceptable for mission-critical applications. In other cases, there are no drives or trays included at all (trays can usually be found on ebay for cheap). I keep thinking that there has to be some source for good, hot-swap server drives at reasonable prices (perhaps some kind of overstock in smaller capacities). If you have enough bays (at least a 2U server, or even a 4U server), each drive being on the small side by contemporary standards shouldn't matter. Thank you Lloyd and Gas Giant.... Unfortunately, Travis has not posted here since 5 days ago, so it is hard to make much further comment until he has had time to post his thoughts about our observations and suggestions. I hope it is not still his health that is keeping him from posting. I am the Kittyman. Please visit and give a Click for Seti City. ID: 14996 · Rating: 0 · rate: / Reply Quote

arkayn Send message Joined: 14 Feb 09 Posts: 999 Credit: 74,932,619 RAC: 0	Message 15009 - Posted: 12 Mar 2009, 12:37:37 UTC - in response to Message 14996. Looks like he feeling a little bit better. New Searches March 11, 2009 Sorry for the lack of communication recently. I'm back from my flu so it should be a bit better now :) I started up some new searches, which should hopefully do a better job avoiding the edges of the search space which was causing some of the weird acting very long workunits. Should be starting up a few new types of searches in the next day or so as well, more on that later. ID: 15009 · Rating: 0 · rate: / Reply Quote

msattler Send message Joined: 15 Jul 08 Posts: 288 Credit: 5,474,012 RAC: 0	Message 15010 - Posted: 12 Mar 2009, 12:40:24 UTC - in response to Message 15009. Looks like he feeling a little bit better. New Searches March 11, 2009 Sorry for the lack of communication recently. I'm back from my flu so it should be a bit better now :) I started up some new searches, which should hopefully do a better job avoiding the edges of the search space which was causing some of the weird acting very long workunits. Should be starting up a few new types of searches in the next day or so as well, more on that later. That's good news! I am sure he has some catching up to do, but hopefully he can give us his thoughts about this thread soon. Glad you are back on your feet Travis. I am the Kittyman. Please visit and give a Click for Seti City. ID: 15010 · Rating: 0 · rate: / Reply Quote

Westsail and Pyxey Send message Joined: 22 Mar 08 Posts: 65 Credit: 15,715,071 RAC: 0	Message 15020 - Posted: 12 Mar 2009, 15:03:05 UTC Woke up to an idle machine and falling rac. Was so close to 30k too. Is there a way to make boinc ask for work more often? This machine is dedicated to MW so I can tweak anything just to best accomodate the "uniqueness" of this project. All morning GPU idle more than running and keep getting: Requesting new tasks.. Scheduler request completed: got 0 new tasks... . . Have to say though..seeing my wifes old dell Pentium D making 3 times the credit my dual GPU 6600 is over cross boinctown is really...umm..umm Not sure the words...lol at chinese "curse" Interesting times indeed.. ID: 15020 · Rating: 0 · rate: / Reply Quote

JAMC Send message Joined: 9 Sep 08 Posts: 96 Credit: 336,443,946 RAC: 0	Message 15041 - Posted: 12 Mar 2009, 19:08:53 UTC Travis: Any update on the returning 'got 0 new tasks' problem of the past days? Thanks! :) ID: 15041 · Rating: 0 · rate: / Reply Quote

Zanth Send message Joined: 18 Feb 09 Posts: 158 Credit: 110,699,054 RAC: 0	Message 15113 - Posted: 13 Mar 2009, 0:28:30 UTC - in response to Message 15041. I can't get my GPU machine to get work hardly ever at all. Now the stupid client is requesting 0 seconds of work, but when i looked up in the log it was requesting work and getting nothing. :( ID: 15113 · Rating: 0 · rate: / Reply Quote

Westsail and Pyxey Send message Joined: 22 Mar 08 Posts: 65 Credit: 15,715,071 RAC: 0	Message 15115 - Posted: 13 Mar 2009, 0:30:44 UTC - in response to Message 15113. Dcf maybe fubar? Try for reset MW and/or up resource share. Still not always get work, but need your client for ask. ID: 15115 · Rating: 0 · rate: / Reply Quote