Message boards :
Number crunching :
No work
Message board moderation
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · Next
Author | Message |
---|---|
Send message Joined: 22 Dec 07 Posts: 51 Credit: 2,405,016 RAC: 0 |
Here we go again: Came in from work... both CPU boxes idle - I don't have an ATI card ;^). Everything's been running smoothly since the end of Feb., so what's changed? IMO, as previously stated, the feeder can't keep the scheduler queue sufficiently fed!! The number of GPUs is increasing exponentially, so the scheduler queue must be enlarged. Not much good for the crunchers, or the project, if we're all waiting for work that's there, but cannot be had! Seejay **Proud Member and Founder of BOINC Team Allprojectstats.com** |
Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0 |
Possibly when the stats page reads 648 to send, 800 are being requested. Which don't allow some to recieve work. Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. |
Send message Joined: 9 Sep 08 Posts: 96 Credit: 336,443,946 RAC: 0 |
And another quad just went dry- that's two in 4-5 hours :( |
Send message Joined: 15 Jul 08 Posts: 288 Credit: 5,474,012 RAC: 0 |
Possibly when the stats page reads 648 to send, 800 are being requested. Which don't allow some to recieve work. That is certainly a possibility....Travis had mentioned increasing the size of the ready to send cache. His last post was 3 days ago that he was down with the flu. Hopefully he is feeling better and can respond to this thread soon with some new info. I am the Kittyman. Please visit and give a Click for Seti City. |
Send message Joined: 20 Sep 08 Posts: 1391 Credit: 203,563,566 RAC: 0 |
Hiyah Mark! Nice to see you back. Apparently Travis hasn't been too well, which does account for a lack of response. They say patience is a virtue, but virtues can be fast-tracked sometimes :-) Don't drink water, that's the stuff that rusts pipes |
Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0 |
Possibly when the stats page reads 648 to send, 800 are being requested. Which don't allow some to recieve work. Maybe the 1 server is already maxed out? Either giving priority to that function or increading the cache (2-3x) would work. Given that either can be done. Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. |
Send message Joined: 15 Jul 08 Posts: 288 Credit: 5,474,012 RAC: 0 |
Possibly when the stats page reads 648 to send, 800 are being requested. Which don't allow some to recieve work. There might be other possibilities, depending on how close the Milkyway server is to it's processing capacity..... Seti runs separate upload and download servers....on separate computers. Whether this could be done as separate processes on the same computer I am not sure. Seti runs multiple splitters on the same computer. Maybe this could be done here if it is determined that splitting capacity is part of the problem. Of course, if the Milkyway server is pretty much to the limits of it's processing power, then the only answer is to add another server to share the load and split some of the processes between the two, if Travis wants his project to continue to grow. I am the Kittyman. Please visit and give a Click for Seti City. |
Send message Joined: 1 Dec 08 Posts: 139 Credit: 8,721,208 RAC: 0 |
Of course, if the Milkyway server is pretty much to the limits of it's processing power, then the only answer is to add another server to share the load and split some of the processes between the two, if Travis wants his project to continue to grow. Hard telling. I think it's safe to say that SETI has at least a couple of orders of magnitude larger volunteer base than this project. So I don't know if it's a fair comparison. It might be better to find one or more projects with similarly-sized WUs and user base, and obtain information on how they are provisioned and configured on the server end. I think Travis is continuing in his efforts in finding the bottleneck, and it doesn't appear that he has yet. Clearly, there are a LOT of moving parts here, and (given the symptoms), this could be something as simple as a single router that's not up to the task, or is experiencing intermittent problems. And (especially given the issues SETI is having), it's not clear to me how easy it actually is to scale BOINC server apps. Most of my boxes have had a much smaller queue of WUs for this project than usual. This is especially telling, given that I've stepped up processing other projects by a considerable amount. I may have even run our of WUs for this project from time to time. Interestingly enough, the one box I haven't seen subject to this phenomenon is a dual 2.5 gHz P4 Xeon Ubuntu server. Now, I'm not saying that it never has been - I haven't been watching that closely and it's strictly CLI so it's harder to track. And before I get chastised again for some kind of BOINC malpractice (or ignorance) - BOINCview doesn't want to work with the version of the client that this box has, nor does the version of the client on my main box want to talk to the (much newer) version of the client this box has. And I have other things to do with my time right now than to update the BOINC client in my main box. I'm hoping that the posts I made a while back, when I was still getting WUs no problem when many others were not, then when I did have some problems, might help Travis with troubleshooting. A final thought - I just checked, and it appears that the results aren't on "insta-purge", so that should be a great help in tracking down the WU supply problem. |
Send message Joined: 21 Aug 08 Posts: 625 Credit: 558,425 RAC: 0 |
Their problem has been and continues to be the lack of money. IMO, they should turn off new account creation and focus on getting their server farm stable, then start upgrading equipment with whatever monies they can come up with, still leaving new account creation turned off. Once it appears that there is some breathing room, new account creation could be turned back on... |
Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0 |
We've been over this already guys. They know what the problem is and SAH has run up against it too. The scheduler's task queue size is limited by how big a shared memory chunk you can afford to allocate to it. Apparently though, MW shares its server resources with other users, and Travis said previously that he was going to have to interface with other lab staff to see if they can up the ante, so to speak, to help alleviate the problem on a more permanent basis. Of course, his coming down with the 'plague' recently hasn't helped matters in this regard. ;-) Alinator |
Send message Joined: 15 Jul 08 Posts: 288 Credit: 5,474,012 RAC: 0 |
Of course, if the Milkyway server is pretty much to the limits of it's processing power, then the only answer is to add another server to share the load and split some of the processes between the two, if Travis wants his project to continue to grow. I was not trying to draw a direct comparison between Milkyway and Seti..... Of course Seti is magnitudes larger than Milkyway. I was simply using Seti as an example of some things that can be done within the Boinc framework to handle a larger user base. Seti's troubles are their own, and I do not wich to have discussion of their problems here, but if they can scale things to even get close to properly handling their user base (which they do, much of the time), then it certainly is possible to scale Boinc to handle the user base of Milkyway.... I have no idea how heavily loaded the Milkyway server is, what Travis has tried or considered, or what his budget may or may not be. I am sure that Travis is still looking into the matter, and hope he can give us some new insights into what he has found soon. I wish him all success in his troubleshooting. I am the Kittyman. Please visit and give a Click for Seti City. |
Send message Joined: 24 Dec 07 Posts: 1947 Credit: 240,884,648 RAC: 0 |
Of course, if the Milkyway server is pretty much to the limits of it's processing power, then the only answer is to add another server to share the load and split some of the processes between the two, if Travis wants his project to continue to grow. Well said Mark. |
Send message Joined: 1 Dec 08 Posts: 139 Credit: 8,721,208 RAC: 0 |
msattler wrote: I have no idea how heavily loaded the Milkyway server is, what Travis has tried or considered, or what his budget may or may not be. I agree with "The Gas Giant" - well said. Your points are well taken. I'm just trying to be helpful here, (not argumentative), by making distinctions and clarifiying things. "Allinator" reports that MW is using part of one server. Yikes! Surely something can be done about this. I've personally had good luck getting perfectly adequate servers from the surplus market. One was an HP-Compaq, dual P4 Xeon 2.5 gHz with HT, 1GB memory and 2 X 36GB HDD (four bays open), and all the hot-swap fans, that I got for $75, which was the minimum ebay bid. This was partially because I could save on shipping because I could do local pickup, but I've seen similar machines (albeit with no drives) for similar prices with shipping included. On my server, I was able to up the memory to 2.5GB for about $40, and I was able to use the 512MB I took out in order to do this in another server. I bought another one for a little more, that had three 36GB and 2 18GB drives. I'm going to run the smaller ones RAID1 on that server, and have moved the three bigger drives to the "main" server, RAID5 I've found the kicker to be the hard drives. They usually have a lot of hours on them by the time they hit the surplus market, and I believe that this isn't acceptable for mission-critical applications. In other cases, there are no drives or trays included at all (trays can usually be found on ebay for cheap). I keep thinking that there has to be some source for good, hot-swap server drives at reasonable prices (perhaps some kind of overstock in smaller capacities). If you have enough bays (at least a 2U server, or even a 4U server), each drive being on the small side by contemporary standards shouldn't matter. |
Send message Joined: 15 Jul 08 Posts: 288 Credit: 5,474,012 RAC: 0 |
msattler wrote:I have no idea how heavily loaded the Milkyway server is, what Travis has tried or considered, or what his budget may or may not be. Thank you Lloyd and Gas Giant.... Unfortunately, Travis has not posted here since 5 days ago, so it is hard to make much further comment until he has had time to post his thoughts about our observations and suggestions. I hope it is not still his health that is keeping him from posting. I am the Kittyman. Please visit and give a Click for Seti City. |
Send message Joined: 14 Feb 09 Posts: 999 Credit: 74,932,619 RAC: 0 |
Looks like he feeling a little bit better. New Searches |
Send message Joined: 15 Jul 08 Posts: 288 Credit: 5,474,012 RAC: 0 |
Looks like he feeling a little bit better. That's good news! I am sure he has some catching up to do, but hopefully he can give us his thoughts about this thread soon. Glad you are back on your feet Travis. I am the Kittyman. Please visit and give a Click for Seti City. |
Send message Joined: 22 Mar 08 Posts: 65 Credit: 15,715,071 RAC: 0 |
Woke up to an idle machine and falling rac. Was so close to 30k too. Is there a way to make boinc ask for work more often? This machine is dedicated to MW so I can tweak anything just to best accomodate the "uniqueness" of this project. All morning GPU idle more than running and keep getting: Requesting new tasks.. Scheduler request completed: got 0 new tasks... . . Have to say though..seeing my wifes old dell Pentium D making 3 times the credit my dual GPU 6600 is over cross boinctown is really...umm..umm Not sure the words...lol at chinese "curse" Interesting times indeed.. |
Send message Joined: 9 Sep 08 Posts: 96 Credit: 336,443,946 RAC: 0 |
Travis: Any update on the returning 'got 0 new tasks' problem of the past days? Thanks! :) |
Send message Joined: 18 Feb 09 Posts: 158 Credit: 110,699,054 RAC: 0 |
I can't get my GPU machine to get work hardly ever at all. Now the stupid client is requesting 0 seconds of work, but when i looked up in the log it was requesting work and getting nothing. :( |
Send message Joined: 22 Mar 08 Posts: 65 Credit: 15,715,071 RAC: 0 |
Dcf maybe fubar? Try for reset MW and/or up resource share. Still not always get work, but need your client for ask. |
©2024 Astroinformatics Group