Message boards :
Number crunching :
Why is it so hard to get work?
Message board moderation
Previous · 1 . . . 9 · 10 · 11 · 12
Author | Message |
---|---|
Send message Joined: 24 Dec 07 Posts: 1947 Credit: 240,884,648 RAC: 0 |
This is what I remember Travis posting.... There is no doubt that the amount of science being completed IS less now than before the outrage, however, this is NOT due to scripts being used. Something changed at the server side. Before the outage 9 to 11 wu per seconds were before transferred to shared memory and after the outrage only 6-7 wu are being trasnferred to shared memory. It was plain to see on BOINCstats at the time. If the use of scripts was causing the problems then we'd expect to see a further drop off in wu's being transferred as more people use them. Can we please stop perpetuating the myth that the use of scripts is causing fewer wu's to be available overall. Their use is most likely the cause of why the people not using them are not able to get as much work as they were before, but overall there are around 100 new people joining every day, so there'd be fewer in any case..... ;) |
Send message Joined: 5 Mar 09 Posts: 19 Credit: 102,651,985 RAC: 0 |
As I pointed out in this message and to Travis... I agree with this 15 minute minimum time between host contact... As a script user, if I decide to stop running script, the whole project wins a little, and I loose a lot (compared to those still running script). It would be better that every scripter stops, so that we'll all win. But that won't happen as we're greedy humans. That's why I won't stop running script. But as I see it hammers the server, I see as an easy solution to set a 15 minute (or watever) minimum time between host contact ... Brian Silver's analogy with grocery behavior is very accurate. We live a starvation time Every body wants bread, but there is very little amount of baguettes available. A fight is close to occur if no rule is established to limit the maximum number of baguettes a good mother can bring back to feed her hungry childs. |
Send message Joined: 21 Aug 08 Posts: 625 Credit: 558,425 RAC: 0 |
Barring some disaster, it should be 2-4 weeks. At that point, I can sincerely say that I wish all of the GPU folks well with their future endeavors... ;-) While I don't wish them ill will, I do have to admit that it's hard to well up a tear when someone is complaining about the difference of not having work for a few hours out of the day vs. those of us who don't get any tasks from here for a couple days straight and instead of banging on things to try to get something from here, we decide to just get work from somewhere else. |
Send message Joined: 21 Aug 08 Posts: 625 Credit: 558,425 RAC: 0 |
Invoking the fact that I'm a system admin, you can't glean that much off of that graph. Many times people use those graphs when there's been a credit change, usually a reduction, to "sick the admin" with...saying "SEE, EVEN AN IDIOT CAN SEE THAT YOU'RE DRIVING PEOPLE AWAY". Without the raw metrics, you're guessing... Additionally, you're putting your own spin on what Travis said. I didn't read that at all. I'm seeking clarification. Can we please stop perpetuating the myth that the use of scripts is causing fewer wu's to be available overall. Their use is most likely the cause of why the people not using them are not able to get as much work as they were before, but overall there are around 100 new people joining every day, so there'd be fewer in any case..... ;) You'd need to take that up with Travis, since the way I read what he said was to the effect that since people have been hammering on the server, the project productivity has gone down. |
Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0 |
This is what I remember Travis posting.... Whoa!! Stop the presses.... Where are you coming up with this analysis from what Travis said? He's not talking about the rate that work is being transferred into the shared memory buffer (IMO). He's talking about the rate that tasks are being returned to the project. The only irrefutable statements in your arguments has been that the use of scripting (to jump to the head of the line) does not effect the sum total of all work available, and the overall daily throughput is down (by 30% from the peak in credit terms). In fact, it looks to me like you drew your conclusions first, and then went looking for proof rather than the other way around. Now there are a couple of factors which haven't been considered here which could have a major impact for project overall throughput. The first is they don't seem to be running as many different searches concurrently as they did before. This means there is a smaller pool of potential candidates to generate new work for. The second is they all seem to be particle swarm searches. What's this have to do with anything you might ask? If you recall, the PS searches are genetic algorithm simulations, for which it has been stated many times in the past are highly 'iteration' sensitive. You can not be generating work for the next round of calculation before you have sufficient returned data for the current round, or you risk having the simulation 'wander off' down an 'undesirable' (read that as wrong) path. IOWs, if you get too far ahead of yourself, you run the risk of getting doo-doo back. This means that if you have an battalion of very fast machines grabbing all the work they can, every chance they can, the odds that you will reach a point where work generation must stall temporarily are greatly increased. The reason is there are still far more 'conventional' hosts out there, and they still grab a significant amount of work, even though they might get shut out on a request a lot or even run out for periods of time. Anecdotal evidence on my hosts tends to support this. I have observed on some occasions, a host will get the full request of work up to the session limit right away, while at other times you get one or two at a shot, regardless of how many time you 'pop the update button' as soon as the current 7 second delay expires. This is telling me the condition at that time is not that the project can't move tasks into the buffer fast enough, but they are having 'trouble' finding a candidate to generate work for, and there just isn't anything to send. This might explain in large part why historically simulations like this have been developed and run on homogeneous supercomputers or clusters, like Blue Gene for example. I seem to recall that one of the science goals was to figure out if it was even worth the trouble of doing this on the loosely coupled, heterogeneous platform that is known as BOINC (or public DC in general). I'm pretty sure that one of the findings from this is going to be, "Yes, it works fairly well. However, a main consideration in setting a run up is that you need to evaluate if you should band limit the range of host performance to the type work being done, in order to achieve optimum task throughput and host utilization.". Alinator |
Send message Joined: 29 Sep 08 Posts: 1618 Credit: 46,511,893 RAC: 0 |
Oh may god.:( Close the thread please! ;-) its like a horror movie, isnt it? We hate it but we look at it anyway. Still - it can be enough of even horror movies.. day out and day in. ;p I think i should stay away from reading this thread. It gives such a bad feeling..... Its probably something wrong with me. Just dont fit in. :p Better to go back to other areas and have fun. ;-D |
Send message Joined: 21 Aug 08 Posts: 625 Credit: 558,425 RAC: 0 |
He's not talking about the rate that work is being transferred into the shared memory buffer (IMO). He's talking about the rate that tasks are being returned to the project. Actually, I'm not sure what he was talking about. There are two main possibilities...either the amount inbound or the amount outbound from the scheduler and file transfer servers. It's confusing because of the choice of words he used, specifically "getting". That word can be used to mean the amount being received back in, or it could be talking about average performance rates outbound. I'd really like him to clarify which direction he was talking about...
Honestly, you're giving a great explanation below as to why hammering on the server can be counter-productive. I hope people take the time to read it and think on it...
Again, I hope people read through that...and think on it.
In other words, establish proper quotas and controls so as to not let the end-users overrun you. (Sorry to be blunt about it, but that's what's going on)... |
Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0 |
<snip> AHHHH.... I see your drift now, sorry. A lot of stuff to absorb quickly in this thread! ;-) I agree, there should at least be a period before a full 'pilot' test run, where the ATi guys can have a chance to get something ready for them on the AP. It would be unfair and unwise to launch GPU fully and not have them included. Alinator |
Send message Joined: 7 Jul 08 Posts: 47 Credit: 13,629,944 RAC: 0 |
Interesting that there seems to be a general consensus on what's "fair" to the GPU folks vis-a-vis the transition, but (as evident from the continuing argument) a lingering wish by some that using scripts to push to the front of the line (along with causing collateral damage) could also be "fair". Wouldn't it be agreed that the former reflects regard for the effects of actions on fellow participants (good), while the latter reflects disregard for the effects of actions on fellow participants (you decide)? --Bill |
Send message Joined: 24 Dec 07 Posts: 1947 Credit: 240,884,648 RAC: 0 |
Travis said in message 23346 Actually, the problem isn't that work isn't being generated fast enough, it's that while there's available work, the server can't move it into shared memory fast enough to keep up with work requests (which get work fed from shared memory). Travis said in message 23369 To be honest, before people started using scripts to hammer the server, we were getting around 9-11 workunits a second. Now we're seeing around 6-7 workunits a second. OK, OK. I put 1 and 1 together to get what I said - which I'm pretty sure equals 2. It is irrefutable that immediately prior to the outage more work was being done than immediately after. Now that can't be due to to a sudden increase in the use of scripts during the outrage. Something else had to have happened. In any case it's a futile debate and we can just wait for the light at the end of the tunnel, which is slowly getting larger. Live long and BOINC. |
Send message Joined: 13 Mar 08 Posts: 804 Credit: 26,380,161 RAC: 0 |
This is being locked due to various complaints. |
©2024 Astroinformatics Group