Is milkyway@home dead?

Author	Message
Alinator Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0	Message 19221 - Posted: 17 Apr 2009, 17:24:31 UTC - in response to Message 19211. Last modified: 17 Apr 2009, 17:32:38 UTC Sorry, but that kind of advice appears to me to be misleading, and there's already too much confusion and frustration as it is. So, until Travis & Co. kindly tell us what's really happening on the server, please avoid statements of "fact" unless it is fact. To wit: I have never seen the ready units level below 400, and I've checked many, many, many, many times. If it ever fell to zero, then 10 minutes later (or whenever the refresh happened), it would have shown as zero (or very, very low). It never has for me. So I don't believe this idea that the refresh delay hides an empty supply. Now, if work is being prepared in batches rather than continuously, and if the refresh happens at the same time a new batch is prepared, then your speculation has merit. Does anyone know? Yes, we do know. Travis has said more than once since the rollout of the GPU apps, 'gadfly' work fetch scripts, etc. that every time he has checked the work generators are capable of producing more than enough work, but the problem is they just can't feed it through the scheduler fast enough to meet the constant demand. <edit> Also, as Borandi mentioned, the Server Status pages are Snapshots generated at project selected intervals, and therefore have to be taken with a grain of salt when relating that to whether and/or why you get a No Work dust off from the scheduler. Alinator ID: 19221 · Rating: 0 · rate: / Reply Quote

6dj72cn8 Send message Joined: 26 Dec 07 Posts: 41 Credit: 2,582,082 RAC: 0	Message 19293 - Posted: 18 Apr 2009, 4:22:32 UTC - in response to Message 19221. I have never seen the ready units level below 400, and I've checked many, many, many, many times. If it ever fell to zero, then 10 minutes later (or whenever the refresh happened), it would have shown as zero (or very, very low). It never has for me. More than enough total work can be created. The bottleneck is that our requests for WUs hammer the scheduler queue so fast that available WUs drain to zero before than the feeder can get the next batch of work in from the work generator. So while the server internally requests for another batch of jobs is generated once the queue drops to about 400-500 (and hence the server status never shows below 400-500) the scheduler actually runs out before the feeder can get the next batch in. ID: 19293 · Rating: 0 · rate: / Reply Quote

IndianaX Send message Joined: 30 Oct 08 Posts: 3 Credit: 7,031,045 RAC: 0	Message 19646 - Posted: 20 Apr 2009, 8:14:58 UTC How do other (bigger) Projects handle that? e.g. WCG always deliver enough WUs! And WCG have more Users. ID: 19646 · Rating: 0 · rate: / Reply Quote

verstapp Send message Joined: 26 Jan 09 Posts: 589 Credit: 497,834,261 RAC: 0	Message 19647 - Posted: 20 Apr 2009, 8:42:03 UTC The semingly obvious solution is to double [triple?] up on the scheduler - find a couple of spare boxes and set them to doing nothing but scheduling to help ease the load. But perhaps there are other considerations, eg bandwidth, physical access to the boxen [CPDN has this problem a lot], whatever. Plus, the crew does have other things to occupy their time and perhaps this problem just slipped down the todo list a bit. Cheers, PeterV . ID: 19647 · Rating: 0 · rate: / Reply Quote

borandi Send message Joined: 21 Feb 09 Posts: 180 Credit: 27,806,824 RAC: 0	Message 19648 - Posted: 20 Apr 2009, 8:52:59 UTC - in response to Message 19647. The semingly obvious solution is to double [triple?] up on the scheduler - find a couple of spare boxes and set them to doing nothing but scheduling to help ease the load. But perhaps there are other considerations, eg bandwidth, physical access to the boxen [CPDN has this problem a lot], whatever. Plus, the crew does have other things to occupy their time and perhaps this problem just slipped down the todo list a bit. I remember one of the admins saying that they'd had a call from the University IT guys, for using 250GB total upload in a month and to try and reduce it...! ID: 19648 · Rating: 0 · rate: / Reply Quote

verstapp Send message Joined: 26 Jan 09 Posts: 589 Credit: 497,834,261 RAC: 0	Message 19655 - Posted: 20 Apr 2009, 10:27:36 UTC Last modified: 20 Apr 2009, 10:28:57 UTC Pkzip? CPDN files tend to arrive/depart in a zipped state. But when you're doing a 30MB [zipped] upload at the end of every WU you appreciate it. Bigger WUs, so that even the GPUers do less frequent trips to the fount of WUs? But as I said before, there may be other considerations... Cheers, PeterV . ID: 19655 · Rating: 0 · rate: / Reply Quote

Debs Send message Joined: 15 Jan 09 Posts: 169 Credit: 6,734,481 RAC: 0	Message 19678 - Posted: 20 Apr 2009, 13:04:56 UTC A few of us have suggested larger wu in the past. I remember Travis saying that would not happen (then 2 days later the size was increased!). I still think that would be a great idea, and not just for people with GPUs. I don't mind tasks taking 12 hours, as long as they are stable and have regular checkpoints :) ID: 19678 · Rating: 0 · rate: / Reply Quote

banditwolf Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0	Message 19692 - Posted: 20 Apr 2009, 14:19:10 UTC - in response to Message 19678. A few of us have suggested larger wu in the past. I remember Travis saying that would not happen (then 2 days later the size was increased!). I still think that would be a great idea, and not just for people with GPUs. I don't mind tasks taking 12 hours, as long as they are stable and have regular checkpoints :) For this project I like to have atleast 30 min wu's. When the project is stable, longer would be fine too. I would hate to loose credit for a few(or more) 12 hour units when the server goes off. Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. ID: 19692 · Rating: 0 · rate: / Reply Quote

Brian Silvers Send message Joined: 21 Aug 08 Posts: 625 Credit: 558,425 RAC: 0	Message 19696 - Posted: 20 Apr 2009, 14:43:21 UTC - in response to Message 19692. A few of us have suggested larger wu in the past. I remember Travis saying that would not happen (then 2 days later the size was increased!). I still think that would be a great idea, and not just for people with GPUs. I don't mind tasks taking 12 hours, as long as they are stable and have regular checkpoints :) For this project I like to have atleast 30 min wu's. When the project is stable, longer would be fine too. I would hate to loose credit for a few(or more) 12 hour units when the server goes off. Either Dave or Travis (I think Travis) said that the current WU size for CPUs is working out for them just fine. I think what some don't realize is that future workunits are based upon the results of current workunits. This is different from projects like SETI or Einstein, where the data was recorded and is unchanging and where they just need systems to plow through chunks of the data. All this discussion really should just wait until the GPU project is up and running, then see what the situation is... ID: 19696 · Rating: 0 · rate: / Reply Quote

banditwolf Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0	Message 19701 - Posted: 20 Apr 2009, 15:06:14 UTC - in response to Message 19696. A few of us have suggested larger wu in the past. I remember Travis saying that would not happen (then 2 days later the size was increased!). I still think that would be a great idea, and not just for people with GPUs. I don't mind tasks taking 12 hours, as long as they are stable and have regular checkpoints :) For this project I like to have atleast 30 min wu's. When the project is stable, longer would be fine too. I would hate to loose credit for a few(or more) 12 hour units when the server goes off. Either Dave or Travis (I think Travis) said that the current WU size for CPUs is working out for them just fine. I think what some don't realize is that future workunits are based upon the results of current workunits. This is different from projects like SETI or Einstein, where the data was recorded and is unchanging and where they just need systems to plow through chunks of the data. All this discussion really should just wait until the GPU project is up and running, then see what the situation is... I'm not saying to change anything. The current size which runs 30 min is fine. I was just saying that longer would be ok too. Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. ID: 19701 · Rating: 0 · rate: / Reply Quote

Alinator Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0	Message 19704 - Posted: 20 Apr 2009, 15:41:36 UTC - in response to Message 19701. Last modified: 20 Apr 2009, 15:44:02 UTC A few of us have suggested larger wu in the past. I remember Travis saying that would not happen (then 2 days later the size was increased!). I still think that would be a great idea, and not just for people with GPUs. I don't mind tasks taking 12 hours, as long as they are stable and have regular checkpoints :) For this project I like to have atleast 30 min wu's. When the project is stable, longer would be fine too. I would hate to loose credit for a few(or more) 12 hour units when the server goes off. Either Dave or Travis (I think Travis) said that the current WU size for CPUs is working out for them just fine. I think what some don't realize is that future workunits are based upon the results of current workunits. This is different from projects like SETI or Einstein, where the data was recorded and is unchanging and where they just need systems to plow through chunks of the data. All this discussion really should just wait until the GPU project is up and running, then see what the situation is... I'm not saying to change anything. The current size which runs 30 min is fine. I was just saying that longer would be ok too. Well, I think that Brian's main point is not the relative difficulty (runtime) of any given set of work on any given host. The point is that MW searches are simulations, and not rote data reduction or analysis like SAH or EAH for example. Therefore, even if the backend had unlimited capability in being able to feed work out to the client population, given the several orders in magnitude difference in processing speed between the slowest and fastest hosts you are going to reach a point where the parallel processing streams get blocked because the next step is waiting for results from slower hosts to be returned in order to determine how to proceed for the next round of calculation. It should be apparent that this is one area where the homogeneous computing capabilities of more traditional supercomputers have an advantage. If one takes the time to look over the publications, one sees that methods for working around little problems like that is one of the objectives of the Computer Science aspect of MW. Alinator ID: 19704 · Rating: 0 · rate: / Reply Quote

Brian Silvers Send message Joined: 21 Aug 08 Posts: 625 Credit: 558,425 RAC: 0	Message 19706 - Posted: 20 Apr 2009, 15:54:21 UTC - in response to Message 19704. All this discussion really should just wait until the GPU project is up and running, then see what the situation is... I'm not saying to change anything. The current size which runs 30 min is fine. I was just saying that longer would be ok too. Well, I think that Brian's main point is not the relative difficulty (runtime) of any given set of work on any given host. The point is that MW searches are simulations, and not rote data reduction or analysis like SAH or EAH for example. Right. Longer runtime tasks can actually be a hinderance rather than a help. The longer the runtime, or the deadline for that matter, the larger the impact on the search if a lot of people start turning them in towards the end of the deadline. The ideal situation would be keeping a P4 to about 30-45 minutes per, then using our "wasteful" CPU calculations to feed the GPUs so that they could do more complex calculations, leaving the simpler / mundane work for CPUs. Symbiosis...rather than fighting over who has more points or who can get work and who can't...or who should or should not be participating... ID: 19706 · Rating: 0 · rate: / Reply Quote

Paul D. Buck Send message Joined: 12 Apr 08 Posts: 621 Credit: 161,934,067 RAC: 0	Message 19719 - Posted: 20 Apr 2009, 18:58:31 UTC - in response to Message 19704. It should be apparent that this is one area where the homogeneous computing capabilities of more traditional supercomputers have an advantage. If one takes the time to look over the publications, one sees that methods for working around little problems like that is one of the objectives of the Computer Science aspect of MW. GPU Grid has similar problems in that the processing is a "stream" of tasks one building on another. I don't know how long their steams are, but the task returned by someone is re-issued as a new task to me and when I return that it is made into another task to be issued to someone else. Their approach is currently to try to pay a bonus to participants that are returning tasks within a time period that is less than the deadline. Though this is raising some angst amongst those that want to run slower cards, my own take on it is that this is a fine example of economics at work ... create an incentive to aid the project with faster returns. I don't plan to run out and replace my 9800GT with another GPU in the next 24 hours ... especially since it seems to be earning the bonus most of the time ... in fact, I think to make the most cost effective, or just the lowest cost system to gain a consistent reliable bonus for work done would be to install 2-3 9800GTs in a system ... but I digress ... ID: 19719 · Rating: 0 · rate: / Reply Quote

Mitchell Send message Joined: 29 Dec 07 Posts: 16 Credit: 158,120,935 RAC: 0	Message 20728 - Posted: 28 Apr 2009, 14:30:56 UTC - in response to Message 19719. back to thread topic. "Is milkyway@home dead? " With a little casual observation, yes, it sure appears so. ID: 20728 · Rating: 0 · rate: / Reply Quote

Brian Silvers Send message Joined: 21 Aug 08 Posts: 625 Credit: 558,425 RAC: 0	Message 20729 - Posted: 28 Apr 2009, 14:47:22 UTC - in response to Message 20728. back to thread topic. "Is milkyway@home dead? " With a little casual observation, yes, it sure appears so. My own casual observation is that "dead" as defined by you apparently means "no news on the GPU project". Work is still flowing, just not at sufficient enough rate to keep everyone fully occupied. This actually means that the project is "functioning at full capacity", not that it is "dead". That said, at nearly two weeks from "almost there", I think an update is due... ID: 20729 · Rating: 0 · rate: / Reply Quote

banditwolf Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0	Message 20731 - Posted: 28 Apr 2009, 15:25:40 UTC - in response to Message 20729. "functioning at full capacity" "full" is actually ~23% of where it was maxed at the end of Feb. That said, at nearly two weeks from "almost there", I think an update is due... I agree. Nothing has changed on the Gpu site in almost as long. Still 'bugs' everywhere. Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. ID: 20731 · Rating: 0 · rate: / Reply Quote

GalaxyIce Send message Joined: 6 Apr 08 Posts: 2018 Credit: 100,142,856 RAC: 0	Message 20735 - Posted: 28 Apr 2009, 16:11:24 UTC - in response to Message 20729. , at nearly two weeks from "almost there", I think an update is due... It would be nice to hear something. It's awfully quiet around here recently... ID: 20735 · Rating: 0 · rate: / Reply Quote

Alinator Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0	Message 20737 - Posted: 28 Apr 2009, 16:43:29 UTC - in response to Message 20731. Last modified: 28 Apr 2009, 16:44:20 UTC "functioning at full capacity" "full" is actually ~23% of where it was maxed at the end of Feb. That said, at nearly two weeks from "almost there", I think an update is due... I agree. Nothing has changed on the Gpu site in almost as long. Still 'bugs' everywhere. Hmmm... I don't think that came out quite the way you wanted. ;-) I'd agree that the project is running currently at ~23% less credit throughput overall than what it was before the big boo-boo. :-( @ Ice: Of course, end of the year exams and other little problems like that might be throwing a wrench into the works a bit. ;-) Alinator ID: 20737 · Rating: 0 · rate: / Reply Quote

GalaxyIce Send message Joined: 6 Apr 08 Posts: 2018 Credit: 100,142,856 RAC: 0	Message 20744 - Posted: 28 Apr 2009, 20:06:39 UTC - in response to Message 20737. @ Ice: Of course, end of the year exams and other little problems like that might be throwing a wrench into the works a bit. ;-) Alinator Exams? Good luck with them. I took one recently, the first for a long, log time. It was truely horrid, all that studying into the night and it just taking over everything for a while. But it's great to pass, yippee!, and have another certificate to file away... ;) ID: 20744 · Rating: 0 · rate: / Reply Quote