Welcome to MilkyWay@home

Is milkyway@home dead?

Message boards : Number crunching : Is milkyway@home dead?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Alinator

Send message
Joined: 7 Jun 08
Posts: 464
Credit: 56,639,936
RAC: 0
Message 19221 - Posted: 17 Apr 2009, 17:24:31 UTC - in response to Message 19211.  
Last modified: 17 Apr 2009, 17:32:38 UTC


Sorry, but that kind of advice appears to me to be misleading, and there's already too much confusion and frustration as it is. So, until Travis & Co. kindly tell us what's really happening on the server, please avoid statements of "fact" unless it is fact.

To wit: I have never seen the ready units level below 400, and I've checked many, many, many, many times. If it ever fell to zero, then 10 minutes later (or whenever the refresh happened), it would have shown as zero (or very, very low). It never has for me. So I don't believe this idea that the refresh delay hides an empty supply.

Now, if work is being prepared in batches rather than continuously, and if the refresh happens at the same time a new batch is prepared, then your speculation has merit.

Does anyone know?


Yes, we do know.

Travis has said more than once since the rollout of the GPU apps, 'gadfly' work fetch scripts, etc. that every time he has checked the work generators are capable of producing more than enough work, but the problem is they just can't feed it through the scheduler fast enough to meet the constant demand.

<edit> Also, as Borandi mentioned, the Server Status pages are Snapshots generated at project selected intervals, and therefore have to be taken with a grain of salt when relating that to whether and/or why you get a No Work dust off from the scheduler.

Alinator
ID: 19221 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
6dj72cn8

Send message
Joined: 26 Dec 07
Posts: 41
Credit: 2,582,082
RAC: 0
Message 19293 - Posted: 18 Apr 2009, 4:22:32 UTC - in response to Message 19221.  

I have never seen the ready units level below 400, and I've checked many, many, many, many times. If it ever fell to zero, then 10 minutes later (or whenever the refresh happened), it would have shown as zero (or very, very low). It never has for me.

More than enough total work can be created. The bottleneck is that our requests for WUs hammer the scheduler queue so fast that available WUs drain to zero before than the feeder can get the next batch of work in from the work generator.

So while the server internally requests for another batch of jobs is generated once the queue drops to about 400-500 (and hence the server status never shows below 400-500) the scheduler actually runs out before the feeder can get the next batch in.
ID: 19293 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
IndianaX
Avatar

Send message
Joined: 30 Oct 08
Posts: 3
Credit: 7,031,045
RAC: 0
Message 19646 - Posted: 20 Apr 2009, 8:14:58 UTC

How do other (bigger) Projects handle that?
e.g. WCG always deliver enough WUs! And WCG have more Users.
ID: 19646 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile verstapp
Avatar

Send message
Joined: 26 Jan 09
Posts: 589
Credit: 497,834,261
RAC: 0
Message 19647 - Posted: 20 Apr 2009, 8:42:03 UTC

The semingly obvious solution is to double [triple?] up on the scheduler - find a couple of spare boxes and set them to doing nothing but scheduling to help ease the load. But perhaps there are other considerations, eg bandwidth, physical access to the boxen [CPDN has this problem a lot], whatever. Plus, the crew does have other things to occupy their time and perhaps this problem just slipped down the todo list a bit.
Cheers,

PeterV

.
ID: 19647 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile borandi
Avatar

Send message
Joined: 21 Feb 09
Posts: 180
Credit: 27,806,824
RAC: 0
Message 19648 - Posted: 20 Apr 2009, 8:52:59 UTC - in response to Message 19647.  

The semingly obvious solution is to double [triple?] up on the scheduler - find a couple of spare boxes and set them to doing nothing but scheduling to help ease the load. But perhaps there are other considerations, eg bandwidth, physical access to the boxen [CPDN has this problem a lot], whatever. Plus, the crew does have other things to occupy their time and perhaps this problem just slipped down the todo list a bit.


I remember one of the admins saying that they'd had a call from the University IT guys, for using 250GB total upload in a month and to try and reduce it...!
ID: 19648 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile verstapp
Avatar

Send message
Joined: 26 Jan 09
Posts: 589
Credit: 497,834,261
RAC: 0
Message 19655 - Posted: 20 Apr 2009, 10:27:36 UTC
Last modified: 20 Apr 2009, 10:28:57 UTC

Pkzip? CPDN files tend to arrive/depart in a zipped state. But when you're doing a 30MB [zipped] upload at the end of every WU you appreciate it.
Bigger WUs, so that even the GPUers do less frequent trips to the fount of WUs? But as I said before, there may be other considerations...
Cheers,

PeterV

.
ID: 19655 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Debs

Send message
Joined: 15 Jan 09
Posts: 169
Credit: 6,734,481
RAC: 0
Message 19678 - Posted: 20 Apr 2009, 13:04:56 UTC

A few of us have suggested larger wu in the past. I remember Travis saying that would not happen (then 2 days later the size was increased!). I still think that would be a great idea, and not just for people with GPUs.

I don't mind tasks taking 12 hours, as long as they are stable and have regular checkpoints :)
ID: 19678 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile banditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
Message 19692 - Posted: 20 Apr 2009, 14:19:10 UTC - in response to Message 19678.  

A few of us have suggested larger wu in the past. I remember Travis saying that would not happen (then 2 days later the size was increased!). I still think that would be a great idea, and not just for people with GPUs.

I don't mind tasks taking 12 hours, as long as they are stable and have regular checkpoints :)


For this project I like to have atleast 30 min wu's. When the project is stable, longer would be fine too. I would hate to loose credit for a few(or more) 12 hour units when the server goes off.
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.
ID: 19692 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 21 Aug 08
Posts: 625
Credit: 558,425
RAC: 0
Message 19696 - Posted: 20 Apr 2009, 14:43:21 UTC - in response to Message 19692.  

A few of us have suggested larger wu in the past. I remember Travis saying that would not happen (then 2 days later the size was increased!). I still think that would be a great idea, and not just for people with GPUs.

I don't mind tasks taking 12 hours, as long as they are stable and have regular checkpoints :)


For this project I like to have atleast 30 min wu's. When the project is stable, longer would be fine too. I would hate to loose credit for a few(or more) 12 hour units when the server goes off.


Either Dave or Travis (I think Travis) said that the current WU size for CPUs is working out for them just fine. I think what some don't realize is that future workunits are based upon the results of current workunits. This is different from projects like SETI or Einstein, where the data was recorded and is unchanging and where they just need systems to plow through chunks of the data.

All this discussion really should just wait until the GPU project is up and running, then see what the situation is...
ID: 19696 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile banditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
Message 19701 - Posted: 20 Apr 2009, 15:06:14 UTC - in response to Message 19696.  

A few of us have suggested larger wu in the past. I remember Travis saying that would not happen (then 2 days later the size was increased!). I still think that would be a great idea, and not just for people with GPUs.

I don't mind tasks taking 12 hours, as long as they are stable and have regular checkpoints :)


For this project I like to have atleast 30 min wu's. When the project is stable, longer would be fine too. I would hate to loose credit for a few(or more) 12 hour units when the server goes off.


Either Dave or Travis (I think Travis) said that the current WU size for CPUs is working out for them just fine. I think what some don't realize is that future workunits are based upon the results of current workunits. This is different from projects like SETI or Einstein, where the data was recorded and is unchanging and where they just need systems to plow through chunks of the data.

All this discussion really should just wait until the GPU project is up and running, then see what the situation is...


I'm not saying to change anything. The current size which runs 30 min is fine. I was just saying that longer would be ok too.
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.
ID: 19701 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Alinator

Send message
Joined: 7 Jun 08
Posts: 464
Credit: 56,639,936
RAC: 0
Message 19704 - Posted: 20 Apr 2009, 15:41:36 UTC - in response to Message 19701.  
Last modified: 20 Apr 2009, 15:44:02 UTC

A few of us have suggested larger wu in the past. I remember Travis saying that would not happen (then 2 days later the size was increased!). I still think that would be a great idea, and not just for people with GPUs.

I don't mind tasks taking 12 hours, as long as they are stable and have regular checkpoints :)


For this project I like to have atleast 30 min wu's. When the project is stable, longer would be fine too. I would hate to loose credit for a few(or more) 12 hour units when the server goes off.


Either Dave or Travis (I think Travis) said that the current WU size for CPUs is working out for them just fine. I think what some don't realize is that future workunits are based upon the results of current workunits. This is different from projects like SETI or Einstein, where the data was recorded and is unchanging and where they just need systems to plow through chunks of the data.

All this discussion really should just wait until the GPU project is up and running, then see what the situation is...


I'm not saying to change anything. The current size which runs 30 min is fine. I was just saying that longer would be ok too.



Well, I think that Brian's main point is not the relative difficulty (runtime) of any given set of work on any given host.

The point is that MW searches are simulations, and not rote data reduction or analysis like SAH or EAH for example.

Therefore, even if the backend had unlimited capability in being able to feed work out to the client population, given the several orders in magnitude difference in processing speed between the slowest and fastest hosts you are going to reach a point where the parallel processing streams get blocked because the next step is waiting for results from slower hosts to be returned in order to determine how to proceed for the next round of calculation.

It should be apparent that this is one area where the homogeneous computing capabilities of more traditional supercomputers have an advantage. If one takes the time to look over the publications, one sees that methods for working around little problems like that is one of the objectives of the Computer Science aspect of MW.

Alinator
ID: 19704 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 21 Aug 08
Posts: 625
Credit: 558,425
RAC: 0
Message 19706 - Posted: 20 Apr 2009, 15:54:21 UTC - in response to Message 19704.  


All this discussion really should just wait until the GPU project is up and running, then see what the situation is...


I'm not saying to change anything. The current size which runs 30 min is fine. I was just saying that longer would be ok too.



Well, I think that Brian's main point is not the relative difficulty (runtime) of any given set of work on any given host.

The point is that MW searches are simulations, and not rote data reduction or analysis like SAH or EAH for example.


Right. Longer runtime tasks can actually be a hinderance rather than a help. The longer the runtime, or the deadline for that matter, the larger the impact on the search if a lot of people start turning them in towards the end of the deadline. The ideal situation would be keeping a P4 to about 30-45 minutes per, then using our "wasteful" CPU calculations to feed the GPUs so that they could do more complex calculations, leaving the simpler / mundane work for CPUs. Symbiosis...rather than fighting over who has more points or who can get work and who can't...or who should or should not be participating...
ID: 19706 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Paul D. Buck

Send message
Joined: 12 Apr 08
Posts: 621
Credit: 161,934,067
RAC: 0
Message 19719 - Posted: 20 Apr 2009, 18:58:31 UTC - in response to Message 19704.  

It should be apparent that this is one area where the homogeneous computing capabilities of more traditional supercomputers have an advantage. If one takes the time to look over the publications, one sees that methods for working around little problems like that is one of the objectives of the Computer Science aspect of MW.

GPU Grid has similar problems in that the processing is a "stream" of tasks one building on another. I don't know how long their steams are, but the task returned by someone is re-issued as a new task to me and when I return that it is made into another task to be issued to someone else.

Their approach is currently to try to pay a bonus to participants that are returning tasks within a time period that is less than the deadline. Though this is raising some angst amongst those that want to run slower cards, my own take on it is that this is a fine example of economics at work ... create an incentive to aid the project with faster returns.

I don't plan to run out and replace my 9800GT with another GPU in the next 24 hours ... especially since it seems to be earning the bonus most of the time ... in fact, I think to make the most cost effective, or just the lowest cost system to gain a consistent reliable bonus for work done would be to install 2-3 9800GTs in a system ... but I digress ...
ID: 19719 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Mitchell

Send message
Joined: 29 Dec 07
Posts: 16
Credit: 158,120,935
RAC: 0
Message 20728 - Posted: 28 Apr 2009, 14:30:56 UTC - in response to Message 19719.  

back to thread topic. "Is milkyway@home dead? "

With a little casual observation, yes, it sure appears so.
ID: 20728 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 21 Aug 08
Posts: 625
Credit: 558,425
RAC: 0
Message 20729 - Posted: 28 Apr 2009, 14:47:22 UTC - in response to Message 20728.  

back to thread topic. "Is milkyway@home dead? "

With a little casual observation, yes, it sure appears so.


My own casual observation is that "dead" as defined by you apparently means "no news on the GPU project". Work is still flowing, just not at sufficient enough rate to keep everyone fully occupied. This actually means that the project is "functioning at full capacity", not that it is "dead".

That said, at nearly two weeks from "almost there", I think an update is due...
ID: 20729 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile banditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
Message 20731 - Posted: 28 Apr 2009, 15:25:40 UTC - in response to Message 20729.  

"functioning at full capacity"

"full" is actually ~23% of where it was maxed at the end of Feb.


That said, at nearly two weeks from "almost there", I think an update is due...


I agree. Nothing has changed on the Gpu site in almost as long. Still 'bugs' everywhere.
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.
ID: 20731 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GalaxyIce
Avatar

Send message
Joined: 6 Apr 08
Posts: 2018
Credit: 100,142,856
RAC: 0
Message 20735 - Posted: 28 Apr 2009, 16:11:24 UTC - in response to Message 20729.  

, at nearly two weeks from "almost there", I think an update is due...

It would be nice to hear something. It's awfully quiet around here recently...


ID: 20735 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Alinator

Send message
Joined: 7 Jun 08
Posts: 464
Credit: 56,639,936
RAC: 0
Message 20737 - Posted: 28 Apr 2009, 16:43:29 UTC - in response to Message 20731.  
Last modified: 28 Apr 2009, 16:44:20 UTC

"functioning at full capacity"

"full" is actually ~23% of where it was maxed at the end of Feb.


That said, at nearly two weeks from "almost there", I think an update is due...


I agree. Nothing has changed on the Gpu site in almost as long. Still 'bugs' everywhere.


Hmmm...

I don't think that came out quite the way you wanted. ;-)

I'd agree that the project is running currently at ~23% less credit throughput overall than what it was before the big boo-boo. :-(

@ Ice:

Of course, end of the year exams and other little problems like that might be throwing a wrench into the works a bit. ;-)

Alinator
ID: 20737 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GalaxyIce
Avatar

Send message
Joined: 6 Apr 08
Posts: 2018
Credit: 100,142,856
RAC: 0
Message 20744 - Posted: 28 Apr 2009, 20:06:39 UTC - in response to Message 20737.  

@ Ice:

Of course, end of the year exams and other little problems like that might be throwing a wrench into the works a bit. ;-)

Alinator

Exams? Good luck with them. I took one recently, the first for a long, log time. It was truely horrid, all that studying into the night and it just taking over everything for a while. But it's great to pass, yippee!, and have another certificate to file away... ;)


ID: 20744 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Number crunching : Is milkyway@home dead?

©2024 Astroinformatics Group