Welcome to MilkyWay@home

Quads waiting on the Server

Message boards : Number crunching : Quads waiting on the Server
Message board moderation

To post messages, you must log in.

AuthorMessage
voltron
Avatar

Send message
Joined: 30 Mar 08
Posts: 50
Credit: 11,593,755
RAC: 0
Message 3114 - Posted: 12 Apr 2008, 2:21:26 UTC

I just attached a new build to the project. It's a Q6600 running at 3.6 ghz. While I appreciate the project's quota of 20 WU's, this rig rips through the 20 in less time than the scheduler allows to refill my cache. Is there a machine to machine learning curve in progress or can I expect this dead air between the 20 packs? My other option is to throttle back to pace with the server.

What's the skinny?

Voltron
ID: 3114 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Dave Przybylo
Avatar

Send message
Joined: 5 Feb 08
Posts: 236
Credit: 49,648
RAC: 0
Message 3115 - Posted: 12 Apr 2008, 2:37:37 UTC - in response to Message 3114.  
Last modified: 12 Apr 2008, 2:39:28 UTC

I just attached a new build to the project. It's a Q6600 running at 3.6 ghz. While I appreciate the project's quota of 20 WU's, this rig rips through the 20 in less time than the scheduler allows to refill my cache. Is there a machine to machine learning curve in progress or can I expect this dead air between the 20 packs? My other option is to throttle back to pace with the server.

What's the skinny?

Voltron


Well, we're trying to build a scheduler that allows for wu's per core. This was supposed to be done n the upgrade, however some files did not get upgraded properly as reverted to the old version. We'll keep you updated on when we get it. Shouldn't be too long. Until then, you're free to throttle back and place your resources on other servers where they will be used.
Dave Przybylo
MilkyWay@home Developer
Department of Computer Science
Rensselaer Polytechnic Institute
ID: 3115 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Jayargh
Avatar

Send message
Joined: 8 Oct 07
Posts: 289
Credit: 3,690,838
RAC: 0
Message 3116 - Posted: 12 Apr 2008, 2:39:12 UTC - in response to Message 3114.  

I just attached a new build to the project. It's a Q6600 running at 3.6 ghz. While I appreciate the project's quota of 20 WU's, this rig rips through the 20 in less time than the scheduler allows to refill my cache. Is there a machine to machine learning curve in progress or can I expect this dead air between the 20 packs? My other option is to throttle back to pace with the server.

What's the skinny?

Voltron


Dead air...consider the behaviour of an 8 or 16 core here.....most people actually use a 2nd project so there is no dead air but how efficient you can be in tweaking your Boinc manager determines how little the 2nd project crunches.

ID: 3116 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Webmaster Yoda
Avatar

Send message
Joined: 21 Dec 07
Posts: 69
Credit: 7,048,412
RAC: 0
Message 3117 - Posted: 12 Apr 2008, 3:33:51 UTC
Last modified: 12 Apr 2008, 3:44:54 UTC

This question keeps rearing its head.

But if you can't set the server up to issue 10, 20 or whatever number of work units per CORE, why not reduce (or eliminate) the "defering communication for 20 minutes" (as I mentioned in a post of 20 January)

Set it to something like 5 minutes perhaps, rather than the current 20 minutes.

It's a setting in the server's BOINC configuration.

See also this post (quoted below)

I'm no expert on the server-side options of BOINC, but a search of the BOINC site shows the following seemingly relevant config options (at http://boinc.berkeley.edu/trac/wiki/ProjectOptions):

<max_wus_in_progress> N </max_wus_in_progress>
<min_sendwork_interval> N </min_sendwork_interval>

Here's what it says about that last option:

min_sendwork_interval
Minimum number of seconds to wait after sending results to a given host, before new results are sent to the same host. Helps prevent hosts with download or application problems from trashing lots of results by returning lots of error results. But don't set it to be so long that a host goes idle after completing its work, before getting new work.


What we are seeing on fast hosts, particularly with short (2 credit) work units is exactly as described - they run out of work before they are allowed to connect again... This may become even more prevalent if the new applications are faster.

Perhaps if that were set to 5 or 10 minutes, things would run more smoothly

Join the #1 Aussie Alliance on MilkyWay!
ID: 3117 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Jayargh
Avatar

Send message
Joined: 8 Oct 07
Posts: 289
Credit: 3,690,838
RAC: 0
Message 3118 - Posted: 12 Apr 2008, 3:48:38 UTC

Yoda -If I remember right Travis tried that change on the old server version and he said in a post (not going to look) that it didn't work or change anything...so either the old code or something was overriding it, Might be worth trying again :)
ID: 3118 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 3125 - Posted: 12 Apr 2008, 12:40:15 UTC - in response to Message 3118.  

Yoda -If I remember right Travis tried that change on the old server version and he said in a post (not going to look) that it didn't work or change anything...so either the old code or something was overriding it, Might be worth trying again :)


the new server could SHOULD have fixed the communication deferral problem. i take it that it hasn't :( hopefully we're going to be swapping to a wu per core limit when everything is updated. When we do that i think we'll actually drop down the limit to maybe 5-10 per core (which should be enough to keep machines full), and then we'll get better search results because most results will have a faster turn around.
ID: 3125 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Jayargh
Avatar

Send message
Joined: 8 Oct 07
Posts: 289
Credit: 3,690,838
RAC: 0
Message 3126 - Posted: 12 Apr 2008, 15:30:52 UTC - in response to Message 3125.  
Last modified: 12 Apr 2008, 15:46:29 UTC

Yoda -If I remember right Travis tried that change on the old server version and he said in a post (not going to look) that it didn't work or change anything...so either the old code or something was overriding it, Might be worth trying again :)


the new server could SHOULD have fixed the communication deferral problem. i take it that it hasn't :( hopefully we're going to be swapping to a wu per core limit when everything is updated. When we do that i think we'll actually drop down the limit to maybe 5-10 per core (which should be enough to keep machines full), and then we'll get better search results because most results will have a faster turn around.


My opinion is a 5/per core and a 10 min rpc call would be ideal from the last few months' discussion. No new host built should run out that way with current wu length and if they did it could be slightly tweaked but this is probably about optimal for project and user :)

How soon until we start working longer units?
ID: 3126 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
voltron
Avatar

Send message
Joined: 30 Mar 08
Posts: 50
Credit: 11,593,755
RAC: 0
Message 3129 - Posted: 13 Apr 2008, 3:44:50 UTC

Thanks for the feedback. Sounds like I eat it (the dead air) until RPI diddles the code. There is some compensation (cold), I have an E4500 on a pathetic Biostar board and they do not make nice together. So this rig runs cold. Time for a new (refurb) mobo. I appreciate your posts.

Voltron
ID: 3129 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
voltron
Avatar

Send message
Joined: 30 Mar 08
Posts: 50
Credit: 11,593,755
RAC: 0
Message 3288 - Posted: 23 Apr 2008, 1:53:29 UTC - in response to Message 3129.  

Thanks for the feedback. Sounds like I eat it (the dead air) until RPI diddles the code. There is some compensation (cold), I have an E4500 on a pathetic Biostar board and they do not make nice together. So this rig runs cold. Time for a new (refurb) mobo. I appreciate your posts.

Voltron


I throttled back the Q6600 to 2.8 ghz and this is approximate to the servers response for 20 packs. This is the processor involved in the electrical fire incident. Luckily, the only component damaged was the motherboard. I am still in the process of finding a new motherboard. The most recent replacement (non incendiary) was doa, so I switched the proc into one of my dual core rigs. It was a DFI 965P refurb from NE, it would spin up, but no post. De bios was not ready for the show.

Voltron
ID: 3288 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Quads waiting on the Server

©2024 Astroinformatics Group