Welcome to MilkyWay@home

Problem with tiny cache in MW

Message boards : Number crunching : Problem with tiny cache in MW
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 8 · Next

AuthorMessage
Profile banditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
Message 31898 - Posted: 3 Oct 2009, 0:16:33 UTC - in response to Message 31894.  

I've thought about Cpdn, but I have a feeling something would happen on my computer and I wouldn't get any credit. :p I've stuck with rosetta 10 hour units for a year+ now and They aren't too bad to run. But lately my luck has been when both have problems I run out of work. I imagine when I get a new computer I will add projects to use the Gpu atleast. By then Gpu projects should get figured out better.


About Mw: I think it would help for now atleast (assuming other types of work need done) to add Gpu specific wus that run numerous times longer either here or the MW Gpu to eliviate some of these problems. Say an hour a unit would let GPUs have hours of work instead of just a few minutes. ((I know this has been said by myself and others already))
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.
ID: 31898 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile verstapp
Avatar

Send message
Joined: 26 Jan 09
Posts: 589
Credit: 497,834,261
RAC: 0
Message 31899 - Posted: 3 Oct 2009, 0:34:11 UTC

Due to its long WUs, multi-hundred to multi-thousand hours [running 24/7 on a fast CPU] depending on the type of WU, CPDN gives credit based on trickles, small "hey I'm still processing" messages sent back to Oxford every x timesteps. So your average i7, even if it doesn't complete the WU [models die for all sorts of reasons] [by default running one per core, ie 8 WUs simultaneously] should get about 4,500 rocks per day in credit.
Cheers,

PeterV

.
ID: 31899 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 21 Aug 08
Posts: 625
Credit: 558,425
RAC: 0
Message 31902 - Posted: 3 Oct 2009, 0:49:26 UTC - in response to Message 31898.  
Last modified: 3 Oct 2009, 0:50:57 UTC

I've thought about Cpdn, but I have a feeling something would happen on my computer and I wouldn't get any credit. :p


I've seen tasks that have error conditions where people were still granted credit. They award credit on trickles, not at task completion, so you could be 200 hours into a task and it errors out and you would've gotten credit for all the trickles prior to the erroring out...


I've stuck with rosetta 10 hour units for a year+ now and They aren't too bad to run. But lately my luck has been when both have problems I run out of work. I imagine when I get a new computer I will add projects to use the Gpu atleast. By then Gpu projects should get figured out better.


Probably. I think that's what a lot of the more competitive people haven't understood is that they're on the bleeding edge. There will be problems for a period of time. Getting all huffy about it is one's right, but it isn't very wise to try something relatively new and then get all bent out of shape when it doesn't go completely smoothly.

Rosetta may be a better choice for me as well. I'm not big on the whole "climate change" crowd... There have been warnings of "global cooling" which were replaced a few years later with those of "global warming". I think we're far too ignorant as a species to totally understand the biosphere interaction on a planetary scale at this point...


About Mw: I think it would help for now atleast (assuming other types of work need done) to add Gpu specific wus that run numerous times longer either here or the MW Gpu to eliviate some of these problems. Say an hour a unit would let GPUs have hours of work instead of just a few minutes. ((I know this has been said by myself and others already))


That's one option I presented...the longer tasks. As long as they have the same credit ratio, then I don't see why that should be so controversial, but that's just from my expendable CPU point of view...
ID: 31902 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Paul D. Buck

Send message
Joined: 12 Apr 08
Posts: 621
Credit: 161,934,067
RAC: 0
Message 31906 - Posted: 3 Oct 2009, 2:47:40 UTC - in response to Message 31894.  

I dunno... I think I need to either give up on BOINC or move to a project like CPDN where tasks can just run for real long amounts of time and where the competitive sorts don't comprise a large percentage of the user base...

Does that mean I need to leave? :)
ID: 31906 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 21 Aug 08
Posts: 625
Credit: 558,425
RAC: 0
Message 31907 - Posted: 3 Oct 2009, 3:50:23 UTC - in response to Message 31906.  

I dunno... I think I need to either give up on BOINC or move to a project like CPDN where tasks can just run for real long amounts of time and where the competitive sorts don't comprise a large percentage of the user base...

Does that mean I need to leave? :)


You know what I meant...

What's the deal with Orbit? Has Pasqualle (I think that's his name) just given up? I don't know why a project like that can't get proper funding. More people being extremely short-sighted I guess... Roads and bridges won't matter to a large chunk of rock that causes an ELE... I guess people are just willing to bet that it won't happen in their lifetime... If it does, then the outcry will be "why didn't we know about this before it is too late", when astrophysicists tell people that they have about 2-6 months before they die...and there's nothing that can be done to stop it... Of course, one could argue that we're at that point right now and it's better to not know...but I'm sure that there would be some detection of such an event with current technology, even if it was only a couple of days...
ID: 31907 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Paul D. Buck

Send message
Joined: 12 Apr 08
Posts: 621
Credit: 161,934,067
RAC: 0
Message 31913 - Posted: 3 Oct 2009, 8:32:30 UTC - in response to Message 31907.  

I dunno... I think I need to either give up on BOINC or move to a project like CPDN where tasks can just run for real long amounts of time and where the competitive sorts don't comprise a large percentage of the user base...

Does that mean I need to leave? :)


You know what I meant...

Yes, I know ... sorry about the black humor ...
ID: 31913 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile The Gas Giant
Avatar

Send message
Joined: 24 Dec 07
Posts: 1947
Credit: 240,884,648
RAC: 0
Message 31920 - Posted: 3 Oct 2009, 13:00:56 UTC - in response to Message 31913.  

I dunno... I think I need to either give up on BOINC or move to a project like CPDN where tasks can just run for real long amounts of time and where the competitive sorts don't comprise a large percentage of the user base...

Does that mean I need to leave? :)


You know what I meant...

Yes, I know ... sorry about the black humor ...

It's always good to have some humour around....keeps things from getting too serious.
ID: 31920 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Chris S
Avatar

Send message
Joined: 20 Sep 08
Posts: 1391
Credit: 203,563,566
RAC: 0
Message 31922 - Posted: 3 Oct 2009, 13:46:15 UTC

It's always good to have some humour around....keeps things from getting too serious.


I would say it's essential! :-))
ID: 31922 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 729,293,740
RAC: 0
Message 31928 - Posted: 3 Oct 2009, 20:12:10 UTC - in response to Message 31810.  

The ideal solution would be a perfect balance between latency and throughput. This is something only the project staff could determine: if they'd like to run more searches in parallel, they need higher throughput. If they're waiting for results to return they'd need faster turn around times / lower latencies.

That leads me to the conclusion that it would be a good idea to tie the allowed number of concurrent WUs to a hosts average turn around time, a value easily accessible to the server. The project could set the desired value quite flexible / dynamically (within sane limits), depending on the current search state. The allowed WUs per host could be updated daily to keep the number of additional database requests low.

Imagine the situation the Op is in: he's got a dual core, so currently a maximum of 12 WUs. He says for him that lasts about 12 min. For me it would be 8.6 mins. For a 5870 it should last just 4 mins. This is actually causing us to hammer the server or to run idle - neither of which does the project any good. If the cache size was increased for fast hosts only, the number of server requests / database accesses could be reduced.

MrS

OK, back to the original topic. This proposal by ET is still the best solution I've seen yet. Allowing a WU cache based on the average turn around time of the host. It would also most likely have the benefit of reducing the load on the server caused by constant client work requests. Travis, what do you think about this idea?

ID: 31928 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 21 Aug 08
Posts: 625
Credit: 558,425
RAC: 0
Message 31934 - Posted: 4 Oct 2009, 1:25:45 UTC - in response to Message 31928.  
Last modified: 4 Oct 2009, 1:28:56 UTC


OK, back to the original topic. This proposal by ET is still the best solution I've seen yet. Allowing a WU cache based on the average turn around time of the host. It would also most likely have the benefit of reducing the load on the server caused by constant client work requests. Travis, what do you think about this idea?


Of course you like the idea. Your turn around time will be just fine. You won't get shut out of participating because someone simply wants more for themselves...

Explain to me why, if the project were to need more complex work done, that giving you the more complex work while awarding the same credit per unit time is not a good thing.
ID: 31934 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 729,293,740
RAC: 0
Message 31935 - Posted: 4 Oct 2009, 3:45:20 UTC - in response to Message 31934.  
Last modified: 4 Oct 2009, 3:47:45 UTC

The ideal solution would be a perfect balance between latency and throughput. This is something only the project staff could determine: if they'd like to run more searches in parallel, they need higher throughput. If they're waiting for results to return they'd need faster turn around times / lower latencies.

That leads me to the conclusion that it would be a good idea to tie the allowed number of concurrent WUs to a hosts average turn around time, a value easily accessible to the server. The project could set the desired value quite flexible / dynamically (within sane limits), depending on the current search state. The allowed WUs per host could be updated daily to keep the number of additional database requests low.

Imagine the situation the Op is in: he's got a dual core, so currently a maximum of 12 WUs. He says for him that lasts about 12 min. For me it would be 8.6 mins. For a 5870 it should last just 4 mins. This is actually causing us to hammer the server or to run idle - neither of which does the project any good. If the cache size was increased for fast hosts only, the number of server requests / database accesses could be reduced.

MrS

OK, back to the original topic. This proposal by ET is still the best solution I've seen yet. Allowing a WU cache based on the average turn around time of the host. It would also most likely have the benefit of reducing the load on the server caused by constant client work requests. Travis, what do you think about this idea?

Of course you like the idea. Your turn around time will be just fine. You won't get shut out of participating because someone simply wants more for themselves...

ET's whole point is that everyone would have a cache of the same size timewise. No one would be shut out. Right now the project has to wait for slow machines with large caches. Machines with fast turnaround are the ones being shut out because the 12 minute cache is not enough to keep the GPU fed. So they go elsewhere (Collatz at the moment). I can't imagine you believe that is good for the project. I wish the admins would post their thoughts on the subject...
ID: 31935 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 21 Aug 08
Posts: 625
Credit: 558,425
RAC: 0
Message 31937 - Posted: 4 Oct 2009, 4:22:11 UTC - in response to Message 31935.  
Last modified: 4 Oct 2009, 4:41:42 UTC


Right now the project has to wait for slow machines with large caches.


There is no such thing.

Some of the slowest computers attached that I'm aware of belong to Alinator, and I hope he doesn't get mad at me for pointing this out.

Anyway, if you look at the older machines, you will notice they have a WHOPPING whole two tasks in progress at most, many of them either one or none. Slow machines simply can't build up a "large cache" because BOINC will automatically scale them down, telling them that the tasks won't finish in time. The actual worst-case is that someone with a FAST machine, like an i7 without a cuda/ati card comes along and stocks up on 20+ tasks and does nothing with them for the 3 days...

Machines with fast turnaround are the ones being shut out because the 12 minute cache is not enough to keep the GPU fed.


When I say "shut out", I mean totally unable to participate, no matter if in 12 minutes, 12 days, 12 months, or 12 years.

The idea proposed was that systems that are BELOW the average turnaround time DO NOT GET WORK. It wasn't that they get a few here and there. It's that they get NONE AT ALL.

There actually already is a mechanism in place that ends up making it to where hosts that don't report tasks in the 3 days get ratcheted down in the amount of tasks they can get (quota), so that is already in place.

Also, the project appears completely satisfied with the rate of return...

What this seems to really be about is not being able to "hog" tasks for oneself... Maybe you don't intend it to be that way. Maybe you are unaware of the realities of some of the mechanisms already in place. However, that's how it is coming across to me right now...

Again, there is a fine line between being "competitive" and coming off as greedy. Several of you are crossing that line... When this happened before, the response was "well, we have nowhere else to go". Now you do. If the project isn't suiting you, then go elsewhere. I've done that with 2 projects, and a 3rd one on one system.
ID: 31937 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Odd-Rod

Send message
Joined: 7 Sep 07
Posts: 444
Credit: 5,712,451
RAC: 0
Message 31939 - Posted: 4 Oct 2009, 6:54:27 UTC - in response to Message 31937.  


Anyway, if you look at the older machines, you will notice they have a WHOPPING whole two tasks in progress at most, many of them either one or none. Slow machines simply can't build up a "large cache" because BOINC will automatically scale them down, telling them that the tasks won't finish in time.

I can confirm this. I have a PII 400MHz that takes around 1 day 19 hours to do a MW WU. The only time I see it with 2 WUs is shortly before the crunching one finishes. This host returns valid results before the deadline. The average turnaround for it is 2.72 days. A little tight on the 3 day deadline, but the max daily quota is 4999/day, so there can't be many problems.


The actual worst-case is that someone with a FAST machine, like an i7 without a cuda/ati card comes along and stocks up on 20+ tasks and does nothing with them for the 3 days...

Agreed. While I'm sure that faster hosts don't have turnarounds of 2.75 it would be interesting to know how long the project waits for reults.

The idea proposed was that systems that are BELOW the average turnaround time DO NOT GET WORK. It wasn't that they get a few here and there. It's that they get NONE AT ALL.

What average turnaround time would that be? The current one? Or would it be regularly recalculated? That would be a problem because by eliminating hosts below the average, the average turnaround time would get shorter, meaning more hosts would need to be eliminated. And of course the average would get shorter again. Eventually only the host with the fastest turnaround would be left. Hey, we can save time and eliminate all but that one host right now! ;)


Also, the project appears completely satisfied with the rate of return...

Exactly! If they want results sooner they can shorten the dealines, but they will lose crunching power.
ID: 31939 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 729,293,740
RAC: 0
Message 31940 - Posted: 4 Oct 2009, 7:27:16 UTC - in response to Message 31939.  

The idea proposed was that systems that are BELOW the average turnaround time DO NOT GET WORK. It wasn't that they get a few here and there. It's that they get NONE AT ALL.

What average turnaround time would that be? The current one? Or would it be regularly recalculated? That would be a problem because by eliminating hosts below the average, the average turnaround time would get shorter, meaning more hosts would need to be eliminated.

Hi Rod, I don't think this was ever proposed, just something someone thought up and decided to knock down. Have no idea why. Here's two of the proposed ideas:

From ET:

The ideal solution would be a perfect balance between latency and throughput. This is something only the project staff could determine: if they'd like to run more searches in parallel, they need higher throughput. If they're waiting for results to return they'd need faster turn around times / lower latencies.

That leads me to the conclusion that it would be a good idea to tie the allowed number of concurrent WUs to a hosts average turn around time, a value easily accessible to the server. The project could set the desired value quite flexible / dynamically (within sane limits), depending on the current search state. The allowed WUs per host could be updated daily to keep the number of additional database requests low.

MrS

The other by Crunch3r:

No-one want's to single out the CPUs. What we need is max_jobs_on_host_gpu=200 ...
that should reduce the server load a bit since the clients won't hammer it every 60 sec..

ID: 31940 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
BarryAZ

Send message
Joined: 1 Sep 08
Posts: 520
Credit: 302,524,931
RAC: 15
Message 31941 - Posted: 4 Oct 2009, 7:39:57 UTC - in response to Message 31937.  

Hmm -- if all systems that are below the average turnaround were locked out, figuring out the logic, that would mean only the *one* system would the fastest turnaround time would get work.

If there were two systems connected, one slower, and one faster, the slower one would be below the average. OK -- but with 1000 systems connected, once the slowest ones were disconnected, the average would go up, locking out more systems and so on...

Just sayin...



The idea proposed was that systems that are BELOW the average turnaround time DO NOT GET WORK. It wasn't that they get a few here and there. It's that they get NONE AT ALL.



ID: 31941 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Paul D. Buck

Send message
Joined: 12 Apr 08
Posts: 621
Credit: 161,934,067
RAC: 0
Message 31942 - Posted: 4 Oct 2009, 7:45:19 UTC - in response to Message 31939.  

Odd-Rod asked a question and I can suggest some specific answers for fast/wide systems ... YMMV

If you are running 6.10.11 and if you have your cache size set to 0.1 and ATI HD4870 cards your turn around time is, on an i7, about either 22 (2 cards) or 44 (1 card) minutes. The system seems to want to cache the full 48 tasks allowed and returns them in batches of about 3-4 and it will attempt to obtain a full cache.

If you attach Collatz, then, the turn around time gets more complex because the return of tasks is biased by the ratio of MW vs. Collatz ... with the time being as above with no Collatz tasks on the system, and increased by the "depth" of the cache of Collatz tasks obtained because of the strict FIFO rule for processing GPU tasks. On those same cards a Collatz task takes about 10 minutes and change. My queue of Collatz tasks seems to be about 10 when I cycle through and get some ...

Unfortunately it is hard to know for sure because of the downtime of MW which means that more than expected numbers of Collatz tasks may be run.

On a system with GTX260 GPUs the run time is about 3:20 to 3:30 and so the numbers would be about 77 minutes to 154 depending on how many cards installed (I have 2) ... while running MW tasks ... again, strict FIFO means that if I am running other tasks like GPU Grid then the turn around is going to pop-up to 6:30 to 9 something hours between batches.

*IF* you increase cache size, because of the FIFO rule, the run time between batches will be driven by the number of tasks that can be queued by the alternative projects based on the cache size. Again, this is driven solely by the Strict FIFO rule and the number of tasks downloaded by the alternative project. In my one tested case, the cache was 1.25 days causing a download of 90 Collatz tasks and a run time of 900 minutes before I could obtain and start to run one MW batch, rinse and repeat ... note: a cache size above 0.1 something will cause Resource Share allocations to be ignored for MW because of the interaction of the FIFO rule and the limited number of tasks obtainable and how fast they can be run ...

Oh, one more factor, if you are running Collatz and MW on a two ATI card system during the hand-over it looks like BOINC will run both tasks on one card causing the run times to increase, MW tasks to 1:20 minutes or more and Collatz to as much as 16 minutes ...

Both the strict FIFO bug issues and the hand-over run time increase have been reported though the silence is still deafening ...
ID: 31942 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 21 Aug 08
Posts: 625
Credit: 558,425
RAC: 0
Message 31943 - Posted: 4 Oct 2009, 8:03:00 UTC - in response to Message 31939.  
Last modified: 4 Oct 2009, 8:04:08 UTC


While I'm sure that faster hosts don't have turnarounds of 2.75 it would be interesting to know how long the project waits for reults.


It's not the average turnaround that really matters. Take the situation from SETI where people would download thousands of tasks and then just sit on them. You could get something similar here, like someone with an i7 grabbing 1000 tasks and then having a power outage due to a flood or a snowstorm (since we are heading into that time of year). That i7 might have a turnaround of 0.01 days when it picks up the large batch, but if the power goes out and they can't run their computer, those 1000 tasks could be destined to hit the 3-day timeout.

As for the project, I remember them saying they'd like to have results turned back in in 1 day, but setting the deadline that low causes major interruptions with other projects, giving users the impression that MW is "hogging their CPU" and forcing their way into making BOINC not honor resource shares / task switching.

It all boils down to a few people that are unhappy about their points not going up as fast as they'd like not thinking through all of the ramifications of what they want to feed their "need". It's not really so much about improving the quality of the project, although ETA is sincere I'm sure...

Finally, it needs to be verified by someone "official" or at least "semi-official" that CPU users are actually disenfranchised more by this latest credit cut than GPU users, as various people appear to not wish to believe the stat sites.

As for me...I grow real weary of having people infer or directly state that those of us who have taken a credit cut and aren't really overly upset about the cut itself, are responsible for the woes of people who have skyrocketing credit each and every day...when the truth of the matter is that if they had taken the credit cut that CPU users have, they wouldn't be here anymore...
ID: 31943 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 21 Aug 08
Posts: 625
Credit: 558,425
RAC: 0
Message 31944 - Posted: 4 Oct 2009, 8:11:17 UTC - in response to Message 31942.  

Odd-Rod asked a question and I can suggest some specific answers for fast/wide systems ... YMMV

If you are running 6.10.11 and if you have your cache size set to 0.1 and ATI HD4870 cards your turn around time is, on an i7, about either 22 (2 cards) or 44 (1 card) minutes. The system seems to want to cache the full 48 tasks allowed and returns them in batches of about 3-4 and it will attempt to obtain a full cache.

If you attach Collatz, then, the turn around time gets more complex because the return of tasks is biased by the ratio of MW vs. Collatz ... with the time being as above with no Collatz tasks on the system, and increased by the "depth" of the cache of Collatz tasks obtained because of the strict FIFO rule for processing GPU tasks. On those same cards a Collatz task takes about 10 minutes and change. My queue of Collatz tasks seems to be about 10 when I cycle through and get some ...

Unfortunately it is hard to know for sure because of the downtime of MW which means that more than expected numbers of Collatz tasks may be run.

On a system with GTX260 GPUs the run time is about 3:20 to 3:30 and so the numbers would be about 77 minutes to 154 depending on how many cards installed (I have 2) ... while running MW tasks ... again, strict FIFO means that if I am running other tasks like GPU Grid then the turn around is going to pop-up to 6:30 to 9 something hours between batches.

*IF* you increase cache size, because of the FIFO rule, the run time between batches will be driven by the number of tasks that can be queued by the alternative projects based on the cache size. Again, this is driven solely by the Strict FIFO rule and the number of tasks downloaded by the alternative project. In my one tested case, the cache was 1.25 days causing a download of 90 Collatz tasks and a run time of 900 minutes before I could obtain and start to run one MW batch, rinse and repeat ... note: a cache size above 0.1 something will cause Resource Share allocations to be ignored for MW because of the interaction of the FIFO rule and the limited number of tasks obtainable and how fast they can be run ...

Oh, one more factor, if you are running Collatz and MW on a two ATI card system during the hand-over it looks like BOINC will run both tasks on one card causing the run times to increase, MW tasks to 1:20 minutes or more and Collatz to as much as 16 minutes ...

Both the strict FIFO bug issues and the hand-over run time increase have been reported though the silence is still deafening ...


Maybe these users who wish to pile their problems, both real and perceived, upon those of us with "slow systems" might consider aiming their angst at UCB and the BOINC development team then...
ID: 31944 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Paul D. Buck

Send message
Joined: 12 Apr 08
Posts: 621
Credit: 161,934,067
RAC: 0
Message 31948 - Posted: 4 Oct 2009, 9:36:11 UTC - in response to Message 31944.  

Maybe these users who wish to pile their problems, both real and perceived, upon those of us with "slow systems" might consider aiming their angst at UCB and the BOINC development team then...

I have no complaints about users with slow systems ... :)

I have had a few of them myself in the past. And I am less worried about wingmen because over time it all comes out in the wash ... I have tasks adding up to 500 CS on Collatz which means there is a lot more than that queued. I will likely add more or maybe I have reached "steady state" and will just run up and down about that number pending. Over time my wingmen (and women) will turn in tasks and the list will shrink and I will add more and the list will grow ...

I was just trying to illustrate my "usual" scenarios ...

We had another long debate on straightening out the credit nonsense again and it petered out with no conclusion as to a rational way to fix the issues. Sadly, any one of a number of the people that seem to command UCB's attention will find objections to any proposal without suggesting alternatives to improve the proposal, or even suggesting one of their own with the comprehensive detail needed to understand it ... so, here we sit with a system that causes angst and bitter division and no end in sight ...
ID: 31948 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Avatar

Send message
Joined: 1 Sep 08
Posts: 204
Credit: 219,354,537
RAC: 0
Message 31953 - Posted: 4 Oct 2009, 11:43:54 UTC - in response to Message 31902.  

Hi Brian,

sorry, it was only after my last post that I realized what your main problem with my suggestion was. There's actually an easiy fix for this, I just didn't write it down initially. So you may or may not believe me that when I said "sane limits" I meant upper and lower limits.

Just set the minimum amount of concurrent WUs to n(CPU), n(CPU)+1, 2*n(CPU) or 2*(n(CPU)+n(GPU)) - whatever seems most appropriate.

I wouldn't be too worried here about slow machines getting too much work, as BOINC is supposed not to fetch more work than it can handle anyway. And the case you mentioned above: a fast machine getting many WUs and not being able to crunch or report them - well, that's something the deadline has to take care of. Currently each machine could screw up a maximum of 48 WUs this way. This number would obviously increase, so one had to make sure not to send 1 million WUs to a host, no matter what his BOINC requests.

MrS
Scanning for our furry friends since Jan 2002
ID: 31953 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 8 · Next

Message boards : Number crunching : Problem with tiny cache in MW

©2024 Astroinformatics Group