Welcome to MilkyWay@home

No work


Advanced search

Message boards : Number crunching : No work
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · Next

AuthorMessage
Seejay
Avatar

Send message
Joined: 22 Dec 07
Posts: 51
Credit: 2,405,016
RAC: 0
2 million credit badge14 year member badge
Message 14711 - Posted: 10 Mar 2009, 11:31:12 UTC

Here we go again: Came in from work... both CPU boxes idle - I don't have an ATI card ;^). Everything's been running smoothly since the end of Feb., so what's changed? IMO, as previously stated, the feeder can't keep the scheduler queue sufficiently fed!! The number of GPUs is increasing exponentially, so the scheduler queue must be enlarged.

Not much good for the crunchers, or the project, if we're all waiting for work that's there, but cannot be had!
Seejay **Proud Member and Founder of BOINC Team Allprojectstats.com**
ID: 14711 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilebanditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
500 thousand credit badge15 year member badge
Message 14720 - Posted: 10 Mar 2009, 13:26:12 UTC

Possibly when the stats page reads 648 to send, 800 are being requested. Which don't allow some to recieve work.
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.
ID: 14720 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JAMC

Send message
Joined: 9 Sep 08
Posts: 96
Credit: 336,443,946
RAC: 0
300 million credit badge14 year member badge
Message 14723 - Posted: 10 Mar 2009, 14:02:21 UTC

And another quad just went dry- that's two in 4-5 hours :(
ID: 14723 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
msattler

Send message
Joined: 15 Jul 08
Posts: 288
Credit: 5,474,012
RAC: 0
5 million credit badge14 year member badge
Message 14724 - Posted: 10 Mar 2009, 14:05:57 UTC - in response to Message 14720.  

Possibly when the stats page reads 648 to send, 800 are being requested. Which don't allow some to recieve work.

That is certainly a possibility....Travis had mentioned increasing the size of the ready to send cache.
His last post was 3 days ago that he was down with the flu. Hopefully he is feeling better and can respond to this thread soon with some new info.
I am the Kittyman.

Please visit and give a Click for Seti City.




ID: 14724 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Chris S
Avatar

Send message
Joined: 20 Sep 08
Posts: 1391
Credit: 203,560,157
RAC: 0
200 million credit badge14 year member badge
Message 14726 - Posted: 10 Mar 2009, 14:17:08 UTC

Hiyah Mark! Nice to see you back. Apparently Travis hasn't been too well, which does account for a lack of response. They say patience is a virtue, but virtues can be fast-tracked sometimes :-)
Don't drink water, that's the stuff that rusts pipes
ID: 14726 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilebanditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
500 thousand credit badge15 year member badge
Message 14731 - Posted: 10 Mar 2009, 14:45:26 UTC - in response to Message 14724.  

Possibly when the stats page reads 648 to send, 800 are being requested. Which don't allow some to recieve work.

That is certainly a possibility....Travis had mentioned increasing the size of the ready to send cache.
His last post was 3 days ago that he was down with the flu. Hopefully he is feeling better and can respond to this thread soon with some new info.


Maybe the 1 server is already maxed out? Either giving priority to that function or increading the cache (2-3x) would work. Given that either can be done.
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.
ID: 14731 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
msattler

Send message
Joined: 15 Jul 08
Posts: 288
Credit: 5,474,012
RAC: 0
5 million credit badge14 year member badge
Message 14760 - Posted: 10 Mar 2009, 18:00:00 UTC - in response to Message 14731.  

Possibly when the stats page reads 648 to send, 800 are being requested. Which don't allow some to recieve work.

That is certainly a possibility....Travis had mentioned increasing the size of the ready to send cache.
His last post was 3 days ago that he was down with the flu. Hopefully he is feeling better and can respond to this thread soon with some new info.


Maybe the 1 server is already maxed out? Either giving priority to that function or increading the cache (2-3x) would work. Given that either can be done.

There might be other possibilities, depending on how close the Milkyway server is to it's processing capacity.....
Seti runs separate upload and download servers....on separate computers. Whether this could be done as separate processes on the same computer I am not sure.
Seti runs multiple splitters on the same computer. Maybe this could be done here if it is determined that splitting capacity is part of the problem.

Of course, if the Milkyway server is pretty much to the limits of it's processing power, then the only answer is to add another server to share the load and split some of the processes between the two, if Travis wants his project to continue to grow.
I am the Kittyman.

Please visit and give a Click for Seti City.




ID: 14760 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Lloyd M.

Send message
Joined: 1 Dec 08
Posts: 139
Credit: 8,721,208
RAC: 0
5 million credit badge14 year member badge
Message 14822 - Posted: 10 Mar 2009, 22:23:06 UTC - in response to Message 14760.  

Of course, if the Milkyway server is pretty much to the limits of it's processing power, then the only answer is to add another server to share the load and split some of the processes between the two, if Travis wants his project to continue to grow.


Hard telling. I think it's safe to say that SETI has at least a couple of orders of magnitude larger volunteer base than this project. So I don't know if it's a fair comparison. It might be better to find one or more projects with similarly-sized WUs and user base, and obtain information on how they are provisioned and configured on the server end.

I think Travis is continuing in his efforts in finding the bottleneck, and it doesn't appear that he has yet. Clearly, there are a LOT of moving parts here, and (given the symptoms), this could be something as simple as a single router that's not up to the task, or is experiencing intermittent problems.

And (especially given the issues SETI is having), it's not clear to me how easy it actually is to scale BOINC server apps.

Most of my boxes have had a much smaller queue of WUs for this project than usual. This is especially telling, given that I've stepped up processing other projects by a considerable amount. I may have even run our of WUs for this project from time to time.

Interestingly enough, the one box I haven't seen subject to this phenomenon is a dual 2.5 gHz P4 Xeon Ubuntu server. Now, I'm not saying that it never has been - I haven't been watching that closely and it's strictly CLI so it's harder to track. And before I get chastised again for some kind of BOINC malpractice (or ignorance) - BOINCview doesn't want to work with the version of the client that this box has, nor does the version of the client on my main box want to talk to the (much newer) version of the client this box has. And I have other things to do with my time right now than to update the BOINC client in my main box. I'm hoping that the posts I made a while back, when I was still getting WUs no problem when many others were not, then when I did have some problems, might help Travis with troubleshooting.

A final thought - I just checked, and it appears that the results aren't on "insta-purge", so that should be a great help in tracking down the WU supply problem.
ID: 14822 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 21 Aug 08
Posts: 625
Credit: 558,425
RAC: 0
500 thousand credit badge14 year member badge
Message 14877 - Posted: 11 Mar 2009, 4:25:39 UTC - in response to Message 14822.  


And (especially given the issues SETI is having), it's not clear to me how easy it actually is to scale BOINC server apps.


Their problem has been and continues to be the lack of money. IMO, they should turn off new account creation and focus on getting their server farm stable, then start upgrading equipment with whatever monies they can come up with, still leaving new account creation turned off. Once it appears that there is some breathing room, new account creation could be turned back on...
ID: 14877 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Alinator

Send message
Joined: 7 Jun 08
Posts: 464
Credit: 56,639,936
RAC: 0
50 million credit badge14 year member badge
Message 14878 - Posted: 11 Mar 2009, 5:46:08 UTC

We've been over this already guys. They know what the problem is and SAH has run up against it too.

The scheduler's task queue size is limited by how big a shared memory chunk you can afford to allocate to it.

Apparently though, MW shares its server resources with other users, and Travis said previously that he was going to have to interface with other lab staff to see if they can up the ante, so to speak, to help alleviate the problem on a more permanent basis.

Of course, his coming down with the 'plague' recently hasn't helped matters in this regard. ;-)

Alinator
ID: 14878 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
msattler

Send message
Joined: 15 Jul 08
Posts: 288
Credit: 5,474,012
RAC: 0
5 million credit badge14 year member badge
Message 14879 - Posted: 11 Mar 2009, 5:55:12 UTC - in response to Message 14822.  
Last modified: 11 Mar 2009, 5:55:48 UTC

Of course, if the Milkyway server is pretty much to the limits of it's processing power, then the only answer is to add another server to share the load and split some of the processes between the two, if Travis wants his project to continue to grow.


Hard telling. I think it's safe to say that SETI has at least a couple of orders of magnitude larger volunteer base than this project. So I don't know if it's a fair comparison. It might be better to find one or more projects with similarly-sized WUs and user base, and obtain information on how they are provisioned and configured on the server end.

I think Travis is continuing in his efforts in finding the bottleneck, and it doesn't appear that he has yet. Clearly, there are a LOT of moving parts here, and (given the symptoms), this could be something as simple as a single router that's not up to the task, or is experiencing intermittent problems.

And (especially given the issues SETI is having), it's not clear to me how easy it actually is to scale BOINC server apps.


I was not trying to draw a direct comparison between Milkyway and Seti.....
Of course Seti is magnitudes larger than Milkyway. I was simply using Seti as an example of some things that can be done within the Boinc framework to handle a larger user base.

Seti's troubles are their own, and I do not wich to have discussion of their problems here, but if they can scale things to even get close to properly handling their user base (which they do, much of the time), then it certainly is possible to scale Boinc to handle the user base of Milkyway....

I have no idea how heavily loaded the Milkyway server is, what Travis has tried or considered, or what his budget may or may not be.

I am sure that Travis is still looking into the matter, and hope he can give us some new insights into what he has found soon. I wish him all success in his troubleshooting.
I am the Kittyman.

Please visit and give a Click for Seti City.




ID: 14879 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileThe Gas Giant
Avatar

Send message
Joined: 24 Dec 07
Posts: 1947
Credit: 240,884,648
RAC: 0
200 million credit badge14 year member badge
Message 14880 - Posted: 11 Mar 2009, 6:18:13 UTC - in response to Message 14879.  

Of course, if the Milkyway server is pretty much to the limits of it's processing power, then the only answer is to add another server to share the load and split some of the processes between the two, if Travis wants his project to continue to grow.


Hard telling. I think it's safe to say that SETI has at least a couple of orders of magnitude larger volunteer base than this project. So I don't know if it's a fair comparison. It might be better to find one or more projects with similarly-sized WUs and user base, and obtain information on how they are provisioned and configured on the server end.

I think Travis is continuing in his efforts in finding the bottleneck, and it doesn't appear that he has yet. Clearly, there are a LOT of moving parts here, and (given the symptoms), this could be something as simple as a single router that's not up to the task, or is experiencing intermittent problems.

And (especially given the issues SETI is having), it's not clear to me how easy it actually is to scale BOINC server apps.


I was not trying to draw a direct comparison between Milkyway and Seti.....
Of course Seti is magnitudes larger than Milkyway. I was simply using Seti as an example of some things that can be done within the Boinc framework to handle a larger user base.

Seti's troubles are their own, and I do not wich to have discussion of their problems here, but if they can scale things to even get close to properly handling their user base (which they do, much of the time), then it certainly is possible to scale Boinc to handle the user base of Milkyway....

I have no idea how heavily loaded the Milkyway server is, what Travis has tried or considered, or what his budget may or may not be.

I am sure that Travis is still looking into the matter, and hope he can give us some new insights into what he has found soon. I wish him all success in his troubleshooting.

Well said Mark.
ID: 14880 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Lloyd M.

Send message
Joined: 1 Dec 08
Posts: 139
Credit: 8,721,208
RAC: 0
5 million credit badge14 year member badge
Message 14951 - Posted: 11 Mar 2009, 22:27:47 UTC - in response to Message 14879.  
Last modified: 11 Mar 2009, 22:30:20 UTC

msattler wrote:
I have no idea how heavily loaded the Milkyway server is, what Travis has tried or considered, or what his budget may or may not be.


I agree with "The Gas Giant" - well said. Your points are well taken. I'm just trying to be helpful here, (not argumentative), by making distinctions and clarifiying things.

"Allinator" reports that MW is using part of one server. Yikes! Surely something can be done about this. I've personally had good luck getting perfectly adequate servers from the surplus market. One was an HP-Compaq, dual P4 Xeon 2.5 gHz with HT, 1GB memory and 2 X 36GB HDD (four bays open), and all the hot-swap fans, that I got for $75, which was the minimum ebay bid. This was partially because I could save on shipping because I could do local pickup, but I've seen similar machines (albeit with no drives) for similar prices with shipping included.

On my server, I was able to up the memory to 2.5GB for about $40, and I was able to use the 512MB I took out in order to do this in another server. I bought another one for a little more, that had three 36GB and 2 18GB drives. I'm going to run the smaller ones RAID1 on that server, and have moved the three bigger drives to the "main" server, RAID5

I've found the kicker to be the hard drives. They usually have a lot of hours on them by the time they hit the surplus market, and I believe that this isn't acceptable for mission-critical applications. In other cases, there are no drives or trays included at all (trays can usually be found on ebay for cheap). I keep thinking that there has to be some source for good, hot-swap server drives at reasonable prices (perhaps some kind of overstock in smaller capacities). If you have enough bays (at least a 2U server, or even a 4U server), each drive being on the small side by contemporary standards shouldn't matter.
ID: 14951 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
msattler

Send message
Joined: 15 Jul 08
Posts: 288
Credit: 5,474,012
RAC: 0
5 million credit badge14 year member badge
Message 14996 - Posted: 12 Mar 2009, 6:11:13 UTC - in response to Message 14951.  

msattler wrote:
I have no idea how heavily loaded the Milkyway server is, what Travis has tried or considered, or what his budget may or may not be.


I agree with "The Gas Giant" - well said. Your points are well taken. I'm just trying to be helpful here, (not argumentative), by making distinctions and clarifiying things.

"Allinator" reports that MW is using part of one server. Yikes! Surely something can be done about this. I've personally had good luck getting perfectly adequate servers from the surplus market. One was an HP-Compaq, dual P4 Xeon 2.5 gHz with HT, 1GB memory and 2 X 36GB HDD (four bays open), and all the hot-swap fans, that I got for $75, which was the minimum ebay bid. This was partially because I could save on shipping because I could do local pickup, but I've seen similar machines (albeit with no drives) for similar prices with shipping included.

On my server, I was able to up the memory to 2.5GB for about $40, and I was able to use the 512MB I took out in order to do this in another server. I bought another one for a little more, that had three 36GB and 2 18GB drives. I'm going to run the smaller ones RAID1 on that server, and have moved the three bigger drives to the "main" server, RAID5

I've found the kicker to be the hard drives. They usually have a lot of hours on them by the time they hit the surplus market, and I believe that this isn't acceptable for mission-critical applications. In other cases, there are no drives or trays included at all (trays can usually be found on ebay for cheap). I keep thinking that there has to be some source for good, hot-swap server drives at reasonable prices (perhaps some kind of overstock in smaller capacities). If you have enough bays (at least a 2U server, or even a 4U server), each drive being on the small side by contemporary standards shouldn't matter.

Thank you Lloyd and Gas Giant....

Unfortunately, Travis has not posted here since 5 days ago, so it is hard to make much further comment until he has had time to post his thoughts about our observations and suggestions.

I hope it is not still his health that is keeping him from posting.
I am the Kittyman.

Please visit and give a Click for Seti City.




ID: 14996 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilearkayn
Avatar

Send message
Joined: 14 Feb 09
Posts: 999
Credit: 74,932,619
RAC: 0
50 million credit badge13 year member badge
Message 15009 - Posted: 12 Mar 2009, 12:37:37 UTC - in response to Message 14996.  

Looks like he feeling a little bit better.

New Searches
March 11, 2009
Sorry for the lack of communication recently. I'm back from my flu so it should be a bit better now :) I started up some new searches, which should hopefully do a better job avoiding the edges of the search space which was causing some of the weird acting very long workunits. Should be starting up a few new types of searches in the next day or so as well, more on that later.

ID: 15009 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
msattler

Send message
Joined: 15 Jul 08
Posts: 288
Credit: 5,474,012
RAC: 0
5 million credit badge14 year member badge
Message 15010 - Posted: 12 Mar 2009, 12:40:24 UTC - in response to Message 15009.  

Looks like he feeling a little bit better.

New Searches
March 11, 2009
Sorry for the lack of communication recently. I'm back from my flu so it should be a bit better now :) I started up some new searches, which should hopefully do a better job avoiding the edges of the search space which was causing some of the weird acting very long workunits. Should be starting up a few new types of searches in the next day or so as well, more on that later.

That's good news!
I am sure he has some catching up to do, but hopefully he can give us his thoughts about this thread soon.
Glad you are back on your feet Travis.
I am the Kittyman.

Please visit and give a Click for Seti City.




ID: 15010 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileWestsail and *Pyxey*
Avatar

Send message
Joined: 22 Mar 08
Posts: 65
Credit: 15,715,071
RAC: 0
10 million credit badge14 year member badge
Message 15020 - Posted: 12 Mar 2009, 15:03:05 UTC

Woke up to an idle machine and falling rac. Was so close to 30k too.

Is there a way to make boinc ask for work more often? This machine is dedicated to MW so I can tweak anything just to best accomodate the "uniqueness" of this project.

All morning GPU idle more than running and keep getting:

Requesting new tasks..
Scheduler request completed: got 0 new tasks...

.
.
Have to say though..seeing my wifes old dell Pentium D making 3 times the credit my dual GPU 6600 is over cross boinctown is really...umm..umm
Not sure the words...lol at chinese "curse"
Interesting times indeed..

ID: 15020 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JAMC

Send message
Joined: 9 Sep 08
Posts: 96
Credit: 336,443,946
RAC: 0
300 million credit badge14 year member badge
Message 15041 - Posted: 12 Mar 2009, 19:08:53 UTC

Travis:
Any update on the returning 'got 0 new tasks' problem of the past days?
Thanks!
:)
ID: 15041 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileZanth
Avatar

Send message
Joined: 18 Feb 09
Posts: 158
Credit: 110,630,897
RAC: 1
100 million credit badge13 year member badge
Message 15113 - Posted: 13 Mar 2009, 0:28:30 UTC - in response to Message 15041.  

I can't get my GPU machine to get work hardly ever at all. Now the stupid client is requesting 0 seconds of work, but when i looked up in the log it was requesting work and getting nothing. :(
ID: 15113 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileWestsail and *Pyxey*
Avatar

Send message
Joined: 22 Mar 08
Posts: 65
Credit: 15,715,071
RAC: 0
10 million credit badge14 year member badge
Message 15115 - Posted: 13 Mar 2009, 0:30:44 UTC - in response to Message 15113.  

Dcf maybe fubar?
Try for reset MW and/or up resource share.
Still not always get work, but need your client for ask.
ID: 15115 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · Next

Message boards : Number crunching : No work

©2022 Astroinformatics Group