Welcome to MilkyWay@home

Problem with tiny cache in MW


Advanced search

Message boards : Number crunching : Problem with tiny cache in MW
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 8 · Next

AuthorMessage
ProfilePaul D. Buck

Send message
Joined: 12 Apr 08
Posts: 621
Credit: 161,934,067
RAC: 0
100 million credit badge10 year member badge
Message 31961 - Posted: 4 Oct 2009, 17:43:33 UTC - in response to Message 31953.  

I wouldn't be too worried here about slow machines getting too much work, as BOINC is supposed not to fetch more work than it can handle anyway.

Well, they added THAT feature in 6.10.4 and 6.10.5 ... :)
ID: 31961 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 21 Aug 08
Posts: 625
Credit: 558,425
RAC: 0
500 thousand credit badge10 year member badge
Message 31972 - Posted: 5 Oct 2009, 0:21:05 UTC - in response to Message 31953.  
Last modified: 5 Oct 2009, 0:23:41 UTC


I wouldn't be too worried here about slow machines getting too much work, as BOINC is supposed not to fetch more work than it can handle anyway.


Tell that to the people who appear to think that those of us with "slow systems" are the boogeymen then...

And the case you mentioned above: a fast machine getting many WUs and not being able to crunch or report them - well, that's something the deadline has to take care of. Currently each machine could screw up a maximum of 48 WUs this way. This number would obviously increase, so one had to make sure not to send 1 million WUs to a host, no matter what his BOINC requests.


Here is the rub:

The OP said 12 tasks = 12 minutes, ergo 60 tasks per hour, or 60 * 24 = 1440 tasks for a 1-day cache. Only want 3 full hours worth? OK, then that is 480 tasks. Now multiply that by, oh, even 200 hosts (the top 200, which are all GPU) and you have 96,000 tasks consumed by 200 systems like the one the OP mentioned, which just happens to be greater than the total number of tasks in progress at this very moment... So, a 3-hour cache for some folks that want the most points they can get shuts out most everyone else...

Now, if just by chance those 200 systems are along the I-95 corridor from Virginia to Maine on the Eastern seaboard of the United States and you add in 1 very strong costal low, aka "Nor'easter", or other blizzard, like what happened in 1996, 2003, 2005, or 2006, and a majority of those systems lose power, you've just stopped the project in it's tracks, even if RPI still had power.

Again, this is all about people's greed without them thinking through the consequences. I'm sure you're sincere, but the plain fact of the matter is that the project is happy with the way things are, else they'd have changed it.
ID: 31972 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 21 Aug 08
Posts: 625
Credit: 558,425
RAC: 0
500 thousand credit badge10 year member badge
Message 31974 - Posted: 5 Oct 2009, 1:56:54 UTC - in response to Message 31972.  

I math flubbed...

3 hours would be 180... Even still 180 * 200 = 36000 More than likely at least the top 500-1000 hosts are GPU, so the point still stands that for just a mere 3-hour cache you'd completely consume all of the tasks that the project has on hand, thus shutting out thousands of other hosts.

People need to realize that this project simply has too much available computing power. If everyone starts fighting for what is available, we'll just end up back where we were with the lengthy work outages...
ID: 31974 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Donnie
Avatar

Send message
Joined: 19 Jul 08
Posts: 67
Credit: 272,086,462
RAC: 0
200 million credit badge10 year member badge
Message 31976 - Posted: 5 Oct 2009, 2:27:12 UTC - in response to Message 31974.  

Wait until the 5870s hit the site!! There are other projects, really!!!
ID: 31976 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfilePaul D. Buck

Send message
Joined: 12 Apr 08
Posts: 621
Credit: 161,934,067
RAC: 0
100 million credit badge10 year member badge
Message 31978 - Posted: 5 Oct 2009, 7:04:57 UTC - in response to Message 31976.  

Wait until the 5870s hit the site!! There are other projects, really!!!

Sadly, at this time there is only one other ATI capable project. And, many of us are attached there and use that as the back-stop for MW on our ATI cards.

The only good news is that GDF at GPU Grid has said that when OpenCL on ATI cards is usable they will be coming out with an OpenCL version of their application. The only other good news I can think of is that if projects do look about they may see that the ATI world does have much to offer and may be more inclined to support this architecture ... in fact I am surprised that the SaH gurus that have done much of the initial work on optimization have not yet made an ATI version of SaH's executable.

Brian is raising some good points and I know I am one of those that is causing issues with two fairly fast ATI cards chewing through the work here ... (sorry Brian :)) and he is also right that there are competing issues that make this a very complex issue without any easy resolution. LHC had similar problems before it seems to have run permanently out of work, but, to the extent possible I thought that they did try to "spread the wealth" and it seems to me that MW is trying to do the same.

I know when I was doing MW mostly on my CPUs that I wanted a high usage on my machines and at times had trouble keeping work on hand ... it was only when the complexity rose to where the run time on my systems rose above an hour that I was able to keep a reasonable cache though that could still be hard with outages.

This is sadly reminiscent of the time of BOINC Beta when there were only 5 projects and with the problems with the clients and with the site software and the project issues you could still run dry ...
ID: 31978 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Avatar

Send message
Joined: 1 Sep 08
Posts: 204
Credit: 219,354,537
RAC: 0
200 million credit badge10 year member badge
Message 31991 - Posted: 5 Oct 2009, 21:12:21 UTC

A very good point about the project not having enough work to fill the caches as much as people would like. Also that many WUs in flight might kill the database. We'd need to go for a short average turn around time - this way caches are kept low. Something around 30 mins, not much more.
This way a single 4870 would get a slightly larger cache than with the standard BOINC client and 4 CPU cores, as well as a slightly smaller one than with a modified BOINC client. CPU guys would probably be restricted to a hand full of WUs. But everyone with the balance between GPU power and number of CPU cores tilted further into the direction of GPU power would benefit and could probably reduce the idle time (i.e. 2 cores & 1 GPU or 4/8 cores & several GPUs or 4 cores & a 5870).

There's another option, though:

- leave MW as it is
- modify BOINC so that it acts according to the following rule: "run MW on all ATIs, keep a cache of xxx mins, contact the server at least every yy mins" [this is exactly what Twodees client does)
- and additionally tell BOINC: "if you can't get new MW WUs and the last cached ones are about to be crunched, active backup project Collatz. Here fetch at most 2 WUs per GPU. Stop fetching Collatz as soon as you get MW WUs again"

Sounds simple, but with a standard BOINC this is just downright impossible. Simple rules like this are completely unlike what the BOINC scheduler typically obeys. Which, IMO, is why it still struggles with GPUs and all the possible hardware / project combinations. In the end this causes us to micro manage our systems - activate Collatz when ever MW has the smallest hickup.

At the end of the day fixing BOINC may be much smarter than trying to fix MW.

People need to realize that this project simply has too much available computing power.

This is why I suggested to make the project ATI-only back in January.. ;)

MrS
Scanning for our furry friends since Jan 2002
ID: 31991 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profileverstapp
Avatar

Send message
Joined: 26 Jan 09
Posts: 589
Credit: 497,834,261
RAC: 0
300 million credit badge10 year member badge
Message 31992 - Posted: 5 Oct 2009, 21:22:59 UTC

Considering the length of the todo list for the boinc devs, good luck in getting DA to agree to this. :)

>too much available computing power.
Well we could all switch to nv and get half the power at twice the price. :p
Cheers,

PeterV

.
ID: 31992 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 21 Aug 08
Posts: 625
Credit: 558,425
RAC: 0
500 thousand credit badge10 year member badge
Message 31994 - Posted: 5 Oct 2009, 22:33:41 UTC - in response to Message 31991.  
Last modified: 5 Oct 2009, 22:45:12 UTC

A very good point about the project not having enough work to fill the caches as much as people would like.


Yep...

Also that many WUs in flight might kill the database.


Most likely...

We'd need to go for a short average turn around time - this way caches are kept low. Something around 30 mins, not much more.


That'd shut out tons of people. The absolute best I can do with my Athlon64 on the 3s tasks right now is around 40-45 minutes.


CPU guys would probably be restricted to a hand full of WUs.


WE ALREADY ARE!!!!!!!!!!!!!!!!!!!!!!

As I said, my Athlon64, which is about the best performance you're going to find out of any single-core CPU systems and bottom-end HyperThreaded P4s and Pentium D, can only do 1 task in 40-45 minutes. 6 tasks are done in 4-4.5 hours, well below the stated 1-day that the project would like. It just so happens that it isn't fast enough for people who, selfishly, want more for themselves so they can see their points go higher in an already extremely broken credit system. With all the changes up and down over the years, the "leaderboard" is not a true indication of who "leads", only an indicator of who "has more points right now". Long-term leaderboard scores are non-existent due to the continual devaluing of certain projects and overvaluing of others.


At the end of the day fixing BOINC may be much smarter than trying to fix MW.


Exactly. The angst with the work fetch issues needs to be straightened out by the BOINC team. If they do straighten it out though, with all of you piled into this one project, you'll starve out not only those of us with slow systems, but likely each other, since, as I said, there is far too much computing power available here for what the project wants to do / server infrastructure can support...


People need to realize that this project simply has too much available computing power.

This is why I suggested to make the project ATI-only back in January.. ;)


As stated above, with the scheduling fixed, an ATI-only project with the current WUs would end up with all of you just fighting over tasks... It's a losing proposal. The only way you all are going to be happy is if there are substantially longer / complex tasks. With the current tasks, it just is not going to keep you all happy, no matter what.

Again, NO PROJECT, NONE, NADA, ZIP, ZILCH, ZERO, promises that you will have work available 100% of the time or that your computer will be utilized 100% of the time. This is why there are other projects. You now have Collatz, but it isn't playing nice mixing. That's not MWs problem. That's also not my problem or the problem of thousands of other people who do not have the shiny new toys.

Bottom line: All this noise about the plight of GPU users is not really about helping the project. It is really about making oneself feel more important because one sees one's point value go up higher and faster.
ID: 31994 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfilePaul D. Buck

Send message
Joined: 12 Apr 08
Posts: 621
Credit: 161,934,067
RAC: 0
100 million credit badge10 year member badge
Message 31996 - Posted: 5 Oct 2009, 22:53:53 UTC - in response to Message 31991.  

Sounds simple, but with a standard BOINC this is just downright impossible. Simple rules like this are completely unlike what the BOINC scheduler typically obeys. Which, IMO, is why it still struggles with GPUs and all the possible hardware / project combinations. In the end this causes us to micro manage our systems - activate Collatz when ever MW has the smallest hickup.

At the end of the day fixing BOINC may be much smarter than trying to fix MW.

History says this is not going to happen.

Dr. Anderson thinks he knows better how to run BOINC on your system than you do ... or even better than some project might suggest.

He would rather that we micromanage our installations with inadequate use of the available features and/or baby sitting than to add tools and features that would indeed allow us to more or less fire and forget.

Just as one example, we have "return results immediately" as a tool. If I use it on my systems running MW I flood the MW servers with calls I don't want. If I turn it off I can have performance I do not want with GPU Grid where I do want to return results as soon as possible (on my CUDA system). We asked for this sensible change and were told no ... so, do I beat up on MW, or baby-sit my systems?

I could go on with other examples...

As to the backlog of work ... one has to ask why after all these years they have not been able to attract more developers to the fold ... I know of several that could help ... and who have tried to help ... so why aren't they helping more? Occam's Razor ... defective volunteers ... or something else ...
ID: 31996 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileBeyond

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 501,817,790
RAC: 0
500 million credit badge10 year member badge
Message 31998 - Posted: 6 Oct 2009, 0:34:56 UTC

Looks like I started a feisty thread:-)

I think for the most part everyone's motivation here is to get things running more smoothly and thus in the end help the project. I know most of the posters from experience with them on various other forums. They're people with a long history of working to improve BOINC as well as helping to test and debug multiple projects, and am VERY happy to see their participation in this thread. I hope we can dispense with any attacks, innuendos, flames, etc. I don't think that anyone has any secret motives here. Thanks to everyone involved.

Regards/Beyond
ID: 31998 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 21 Aug 08
Posts: 625
Credit: 558,425
RAC: 0
500 thousand credit badge10 year member badge
Message 32001 - Posted: 6 Oct 2009, 1:58:31 UTC - in response to Message 31998.  
Last modified: 6 Oct 2009, 2:02:30 UTC

I think for the most part everyone's motivation here is to get things running more smoothly and thus in the end help the project.


The problem is that there are serious issues with attempting what has been poposed so that a few people are more catered to. It will shut out thousands of people and may end up making it to where even if those of us "slowpokes" aren't around, your systems may be so fast that the server infrastructure simply cannot keep up with you, thus you may end up with an even worse situation. You could go from 12 tasks an hour to 0 tasks an hour for a few hours a day...

The single best way that those of you with all the horsepower can actually help the PROJECT at this point is to just remain content with the soaring credit that you're getting, no matter if you are only running this project for 12-24 minutes an hour. The problems that exist in mixing other projects will eventually get sorted out. Until then, stop to consider that those of us with those "slow systems" are in the majority and you are in the minority. Those of us with the "slow systems" are not the source of your woes. Those of us with the "slow systems" took the credit cut where you did not. Posting graphs of the active hosts or other tactics to try to club the project over the head and/or hold it hostage, despite what you (general sense) think is actually not "cool".
ID: 32001 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileBeyond

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 501,817,790
RAC: 0
500 million credit badge10 year member badge
Message 32003 - Posted: 6 Oct 2009, 2:28:26 UTC - in response to Message 32001.  

I think for the most part everyone's motivation here is to get things running more smoothly and thus in the end help the project.

Those of us with the "slow systems" are not the source of your woes.

Who said that? Certainly not I.

Posting graphs of the active hosts or other tactics to try to club the project over the head and/or hold it hostage, despite what you (general sense) think is actually not "cool".

Graphs, I see no graphs. I have no idea what you're referring to. Take a deep breath. Listen to the beautiful music. Relax.
ID: 32003 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Avatar

Send message
Joined: 1 Sep 08
Posts: 204
Credit: 219,354,537
RAC: 0
200 million credit badge10 year member badge
Message 32047 - Posted: 6 Oct 2009, 20:39:07 UTC

I think Brian is referring mainly to arguments which happened in other threads. The flames went quite high here at MW. So far I'm positive we can keep this thread civil ;)

History says this is not going to happen.

Dr. Anderson thinks he knows better how to run BOINC on your system than you do ... or even better than some project might suggest.


That's why I asked Twodee to include it into his 6.10.x mod, which he'll hopefully do. His 6.6.23 mod is almost perfect, except for times when MW goes down (=micro management) and that the debt build-up can not kill the GPU (his accomplishment) but can strangle the CPU (that's bound to happen on any BOINC prior to 6.10).

Brian wrote:
It will shut out thousands of people


Do you really think that would happen? Do you call it being shut out if all you'd get were 2 MW WUs per core and getting another one as soon as you return one?

More generally you're talking a lot about making GPU WUs more complex. The way I understand it is that currently the MW app evaluates some function, or 2 to 3 function calls with different parameters in the multi stream WUs. The actual search / optimization algorithm is performed on the server and generates these function calls. Up to now this function has not been changed much - that's why they can implement new search schemes without client updates and that's why the WUs for old and new clients are mostly compatible. It's "just" a set of parameters anyway.

So what could it mean to "do more complex work"? You could bundle more than 3 streams / function calls into each WU. Or you could change the actual function. I don't know to what extent the latter desireable, but I guess it touches the scientific heart of it - something which is not done in 5 mins, but not impossible either.

The first option could be implemented easily into the current system: just bundle more function calls into each WU for fast hosts and less ones for slow hosts. Similarly to how Rosetta lets their users choose a desired run time. But that's really just a tool to enable higher overall throughput without killing the database, it doesn't fundamentally change things.

MrS
Scanning for our furry friends since Jan 2002
ID: 32047 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 21 Aug 08
Posts: 625
Credit: 558,425
RAC: 0
500 thousand credit badge10 year member badge
Message 32054 - Posted: 6 Oct 2009, 21:38:45 UTC - in response to Message 32047.  
Last modified: 6 Oct 2009, 21:45:51 UTC


Brian wrote:
It will shut out thousands of people


Do you really think that would happen?


Yes, I do...

Do you call it being shut out if all you'd get were 2 MW WUs per core and getting another one as soon as you return one?


No, I call it a selfish way of trying to shove your (general sense) "problem" onto me and thousands of other people because you (general sense) want more of those precious credits for yourselves (general sense).

I only get a whopping 6 tasks at any one time. I generally have those 6 back within the 1 day that the project had said that they would like to see. So you want to flog me more and take 4 away so that you can have more, even though I'm fully meeting the project's expectations???????????

You keep talking about ways to issue cuts in the amount of work that those of us with CPUs have. Meanwhile, I keep talking about trying to increase the amount of work that both CPU users AND GPU users have.

I don't know how much plainer I can make that. Instead of trying to help yourself only, maybe if you'd try to think about helping everyone....?


More generally you're talking a lot about making GPU WUs more complex. The way I understand it is that currently

...

So what could it mean to "do more complex work"? You could bundle more than 3 streams / function calls into each WU. Or you could change the actual function. I don't know to what extent the latter desireable, but I guess it touches the scientific heart of it - something which is not done in 5 mins, but not impossible either.


What was initially presented for MW_GPU was a situation where GPUs would do tasks of significantly higher complexity. You'd have to ask the project what they had in mind. Also due to the extreme sluggishness of the web server right now, I'm not going to go hunt for the post from either Travis or Dave that mentioned it. I want to say it was Travis, but not sure... Could've been either...

Anyway, as I have said, the project appears to be happy with the current situation, so all this is pointless. However, it does bear noting that generally speaking those of you with GPUs appear to view the rest of us as expendable. It would be nice to not be viewed that way, and it would be nice if a win-win situation could be thought of, rather than a win-lose in your favor...
ID: 32054 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfilePaul D. Buck

Send message
Joined: 12 Apr 08
Posts: 621
Credit: 161,934,067
RAC: 0
100 million credit badge10 year member badge
Message 32059 - Posted: 7 Oct 2009, 0:44:10 UTC

Brian,

I am not sure that the project guys are any more happy about these issues than we ... the GPU version was proposed and they started off in that direction and someone above cut that off ... I don't know if it were the thesis advisors or the university with regard to server loads or what (or both) ...

But it was not by the folks we think of when we speak of the project guys ... like you I am not going to add to the sever load looking but... it is one of those sad but true kinds of stories ...

ID: 32059 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 21 Aug 08
Posts: 625
Credit: 558,425
RAC: 0
500 thousand credit badge10 year member badge
Message 32062 - Posted: 7 Oct 2009, 1:39:17 UTC - in response to Message 32059.  

Brian,

I am not sure that the project guys are any more happy about these issues than we ... the GPU version was proposed and they started off in that direction and someone above cut that off ... I don't know if it were the thesis advisors or the university with regard to server loads or what (or both) ...

But it was not by the folks we think of when we speak of the project guys ... like you I am not going to add to the sever load looking but... it is one of those sad but true kinds of stories ...



Whatever the reason was, one fact remains:

The tasks that are being run right now were designed with CPUs in mind, not GPUs. This is why the project is so slammed, because the GPUs are just too fast. It's not us with "slow systems" that are causing those with "fast systems" to have problems. It is the server-side infrastructure simply being unable to cope with the strain. All that would happen with those of us CPU users out of the way is the system would run out of work more frequently. Those of us with CPU systems appear to be enough of a brake to keep the server from totally choking due to overwhelming demand.

If the tasks were made 100 times longer and paid 100 times the current level of credit, then that would be absolutely no change in the level of the precious credits that people get. However, it can't be done quickly, or at all if it was stopped by higher ups, so then it becomes this "trample over the small people"????

People are so greedy... I swear... Over something that is so badly broken too ("cobblestones")...
ID: 32062 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Avatar

Send message
Joined: 1 Sep 08
Posts: 204
Credit: 219,354,537
RAC: 0
200 million credit badge10 year member badge
Message 32096 - Posted: 7 Oct 2009, 21:37:10 UTC
Last modified: 7 Oct 2009, 21:40:43 UTC

Brian,

either you totally misunderstand me or you don't trust the project staff to implement my idea properly. What I'm suggesting does contain enough levers to tune the system, even automatically:

1.) no one's going to run out of work
2.) the server load is being kept in check

When ever you request WUs and you have less in stock than the maximum allowed amount for your host you'll get new ones, up to the maximum. In your case it could look like this: 6 WUs in 1 day means you finish a WU every 4 hours. If the desired turn around time was less than 8h (which it probably would) you wouldn't get more than 2 WUs for that single core machine. However, after 4h you'd return one WU and get another one. How is this different from your current 6-WU cache? Only in the case the project or your internet is down, right? (*)

If I had a cache of a few hours at MW I'd call that "happily crunching" rather than "being shut out". (**)
Does this explain point (1) enough?

Now why wouldn't this strangle the server, as I claim as point (2)? Allowing more of the current WUs for everyone would very likely kill the server, as we agreed upon earlier. Well, I've got a simple suggestion to solve this: make the number of streams within each WU dynamic. If the server gets overloaded by the amount of WUs increase the number of streams within the WUs to reduce the number of WUs. Again, this is something which can be done automatically and could be implemented in various ways.

One could add a user preference for how many streams they prefer, providing e,g, "small", "medium" and "large" WUs. How large these are could be changed by the project over time as hardware evolves. The difference between these types could vary a lot. For example: small = 1 stream for cpus, medium = 10 for who ever wants it and large = 100 for very fast GPUs. When the server generates work it could bundle streams into these 3 "package types" as needed. And when it hands the work out it knows how much work it put into them.

Or it could work automatically in the background, without a user preference. From various stats the server knows approximately how fast hosts are and accordingly hands them larger or smaller packages.

It could also work like that: if a host reaches the maximum allowed amount of WUs and still has a turn-around time below the limit it could be gradually given larger WUs.

Of course, the separate project offers additional flexibility and I'd like to see it combined with what I suggested here. However, it didn't happen. I don't know the reason, but I suspect it's not just Travis having been in a strange mood. So I'm sceptical it's gonna happen in any short or medium time frame.

MrS

(*) Let's take this thought a step further and be a little perky. We could as well turn the argument around: the only thing CPU guys would loose after these changes is cache size, with caches probably still being at the order of hours. The poor OP with an ATI and a dual core only gets a cache of 12 mins. Isn't this unfair? Why should he be denied the peace of mind you can have, just because he's got faster hardware?

(**) Being even more perky I could ask: if a cache reduction down to a few hours is being shut out, what'd you call a 12 mins cache?

EDIT:
It's not us with "slow systems" that are causing those with "fast systems" to have problems. It is the server-side infrastructure simply being unable to cope with the strain.


I totally agree with the first point. And the second one actually means that the project should not be satisfied with the way things are currently running, shouldn't it? Add in a couple of 5800 series cards and the server may soon be in real trouble again.
Scanning for our furry friends since Jan 2002
ID: 32096 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 21 Aug 08
Posts: 625
Credit: 558,425
RAC: 0
500 thousand credit badge10 year member badge
Message 32098 - Posted: 7 Oct 2009, 22:07:04 UTC - in response to Message 32096.  
Last modified: 7 Oct 2009, 22:38:07 UTC

Brian,

either you totally misunderstand me


No, you are coming through loud and clear. You want more for you and less for me.


1.) no one's going to run out of work
2.) the server load is being kept in check


Just like nobody ran out of work and the server load was kept in check when the hordes of GPUs hammered it because of the short turnaround time of those tasks yesterday?


In your case it could look like this: 6 WUs in 1 day means you finish a WU every 4 hours.


Nope. I finish the entire 6 in approx 4 hours of CPU time for the 2s tasks, up to 8 hours for the entire 6 for the 3s tasks. It might be longer than that wall-clock time, up to double, depending on system usage. In any case, it is less than 1 day.


If the desired turn around time was less than 8h (which it probably would) you wouldn't get more than 2 WUs for that single core machine. However, after 4h you'd return one WU and get another one.


Nope. I'd return one immediately after it finished, due to how the 3-day deadline automatically makes me report immediately as it is right now. You'd have zero db "savings" by trying to confiscate tasks from me to provide to you.

How is this different from your current 6-WU cache? Only in the case the project or your internet is down, right? (*)


Because I report tasks one-by-one without any intervention due to how things work with this project. So, what your changes would do is make my system still report one-by-one, but request work every 45-90 minutes vs. the 5-6 hour pause as it is now. So much for decreasing the requests for work. There'd actually be more requests... For even slower systems, you'd stack up even more work requests, and since they're shorter apart, you'd introduce even more competition for sparse amounts of work. Great idea. Just great....

Basically, to make a car analogy, this project has an Aston-Martin DBS engine (GPUs) shoved into a VW Beetle and trying to drive in normal traffic, not out on the open road. Your GPUs are so much overkill for this project, but yet you are wanting to stay with the status quo (the current tasks), but want to, in essence, tell the other drivers to get off of the road because you're coming through and you have the rights to all lanes at all times...because, well, because you just think you do...

Again, your desire to have more for you and less for me is a very, very, very selfish position to take. Your position is one of confiscation to suit you, where as my position is one of trying to make things better for everyone. The irony is, the method of confiscation that you propose most likely would make it worse for you...
ID: 32098 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Avatar

Send message
Joined: 1 Sep 08
Posts: 204
Credit: 219,354,537
RAC: 0
200 million credit badge10 year member badge
Message 32143 - Posted: 8 Oct 2009, 19:11:44 UTC - in response to Message 32098.  

No, you are coming through loud and clear. You want more for you and less for me.


You know that these sentences can not both be correct at the same time.

Just like nobody ran out of work and the server load was kept in check when the hordes of GPUs hammered it because of the short turnaround time of those tasks yesterday?


Now what exactly am I proposing? A dumb increase of cache size for everyone?
We need a clever algorithm to achieve what I claimed. We don't have that yet. That's why the server is truggling and that's what I want to talk about here.

Nope. I'd return one immediately after it finished, due to how the 3-day deadline automatically makes me report immediately as it is right now. You'd have zero db "savings" by trying to confiscate tasks from me to provide to you.
...
Because I report tasks one-by-one without any intervention due to how things work with this project. So, what your changes would do is make my system still report one-by-one, but request work every 45-90 minutes vs. the 5-6 hour pause as it is now. So much for decreasing the requests for work.There'd actually be more requests... For even slower systems, you'd stack up even more work requests, and since they're shorter apart, you'd introduce even more competition for sparse amounts of work. Great idea. Just great....


So basically you're agreeing that for you nothing would really change. The only concern is the server load. I'm glad we got that "being shut out" thing out of the way ;)

Regarding server load you have to see the big picture. Just one or two posts before you yourself wrote how the CPUs are not the problem for the server. I'd go one step further and say: even if all cpus' average time between database requests was reduced from 6h to 45 mins it wouldn't really matter for the server.

The reason is that there are machines like the one owned by the OP that run dry after 12 mins. In fact without any intervention or mods and without "return results immediately" my BOINC contacts the server every couple of minutes. This is bound to be similar for pretty much every ATI-equipped host (not so much nVidia due to their low performance). That's where a reduction in the amount of server requests really counts. What I propose is like adding 1 request from "slow" hosts and avoiding 10 requests from fast ones.

Again, your desire to have more for you and less for me is a very, very, very selfish position to take. Your position is one of confiscation to suit you, where as my position is one of trying to make things better for everyone. The irony is, the method of confiscation that you propose most likely would make it worse for you...


That's a fact which has been set in stone for you, ever since you read the first post of this thread, hasn't it?

Well, if you really want to make this personal: my ATI actually runs pretty well. The only problem I have is when MW goes down, as I have to manually switch to Collatz to keep my hardware working. Which is tedious, especially since their app used to stop computing after a few hours on my system (haven't tested the new ones, though). I don't have much to gain from increased caches.

The cost for this well running GPU is a huge mount of server requests - but I can't help that if the project is not changed. I'd be glad if that wasn't neccessary, for example because in this case there'd be more left for you.

MrS
Scanning for our furry friends since Jan 2002
ID: 32143 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 21 Aug 08
Posts: 625
Credit: 558,425
RAC: 0
500 thousand credit badge10 year member badge
Message 32167 - Posted: 8 Oct 2009, 23:36:33 UTC - in response to Message 32143.  
Last modified: 8 Oct 2009, 23:49:44 UTC

No, you are coming through loud and clear. You want more for you and less for me.


You know that these sentences can not both be correct at the same time.


Why? Each and every time I suggest making the work for GPUs to be more complex, thus taking longer, but awarding the same amount of credit per unit time, you retort with "well, why don't we try increasing the amount of work that GPU users have by decreasing the amount of work that CPU users have?"

100 tasks * 53 credits = 5300 credits / 100 minutes = 53 cr/min

1 task * 5300 credits = 5300 credits / 100 minutes = 53 cr/min

Instead of that mathematical truth, you keep coming back with "let's increase the amount of tasks that fast systems have by decreasing the amount that slow systems have."

That is equivalent to your demanding "more for me, less for you"...

Nope. I'd return one immediately after it finished, due to how the 3-day deadline automatically makes me report immediately as it is right now. You'd have zero db "savings" by trying to confiscate tasks from me to provide to you.
...
Because I report tasks one-by-one without any intervention due to how things work with this project. So, what your changes would do is make my system still report one-by-one, but request work every 45-90 minutes vs. the 5-6 hour pause as it is now. So much for decreasing the requests for work.There'd actually be more requests... For even slower systems, you'd stack up even more work requests, and since they're shorter apart, you'd introduce even more competition for sparse amounts of work. Great idea. Just great....


So basically you're agreeing that for you nothing would really change. The only concern is the server load. I'm glad we got that "being shut out" thing out of the way ;)


Read the bolded and underlined part above, then figure out that thousands of additional people all fighting more frequently for fewer tasks, which likely wouldn't be available due to those of you with the much faster systems grabbing them up, equates to being shut out...

Again, increasing the workload on GPUs while paying the same credit rate will mean you get, well, the same credit rate you are receiving now. It also leaves room for others to participate without the project changing things for us again. We took the recent credit cut, you did not. Stop being so short-sighted as to only see the benefit to yourself for grabbing up more tasks so you can get more credit...
ID: 32167 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 8 · Next

Message boards : Number crunching : Problem with tiny cache in MW

©2020 Astroinformatics Group