Message boards :
Number crunching :
Is milkyway@home dead?
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0 |
Yes, we do know. Travis has said more than once since the rollout of the GPU apps, 'gadfly' work fetch scripts, etc. that every time he has checked the work generators are capable of producing more than enough work, but the problem is they just can't feed it through the scheduler fast enough to meet the constant demand. <edit> Also, as Borandi mentioned, the Server Status pages are Snapshots generated at project selected intervals, and therefore have to be taken with a grain of salt when relating that to whether and/or why you get a No Work dust off from the scheduler. Alinator |
Send message Joined: 26 Dec 07 Posts: 41 Credit: 2,582,082 RAC: 0 |
I have never seen the ready units level below 400, and I've checked many, many, many, many times. If it ever fell to zero, then 10 minutes later (or whenever the refresh happened), it would have shown as zero (or very, very low). It never has for me. More than enough total work can be created. The bottleneck is that our requests for WUs hammer the scheduler queue so fast that available WUs drain to zero before than the feeder can get the next batch of work in from the work generator. So while the server internally requests for another batch of jobs is generated once the queue drops to about 400-500 (and hence the server status never shows below 400-500) the scheduler actually runs out before the feeder can get the next batch in. |
Send message Joined: 30 Oct 08 Posts: 3 Credit: 7,031,045 RAC: 0 |
How do other (bigger) Projects handle that? e.g. WCG always deliver enough WUs! And WCG have more Users. |
Send message Joined: 26 Jan 09 Posts: 589 Credit: 497,834,261 RAC: 0 |
The semingly obvious solution is to double [triple?] up on the scheduler - find a couple of spare boxes and set them to doing nothing but scheduling to help ease the load. But perhaps there are other considerations, eg bandwidth, physical access to the boxen [CPDN has this problem a lot], whatever. Plus, the crew does have other things to occupy their time and perhaps this problem just slipped down the todo list a bit. Cheers, PeterV . |
Send message Joined: 21 Feb 09 Posts: 180 Credit: 27,806,824 RAC: 0 |
The semingly obvious solution is to double [triple?] up on the scheduler - find a couple of spare boxes and set them to doing nothing but scheduling to help ease the load. But perhaps there are other considerations, eg bandwidth, physical access to the boxen [CPDN has this problem a lot], whatever. Plus, the crew does have other things to occupy their time and perhaps this problem just slipped down the todo list a bit. I remember one of the admins saying that they'd had a call from the University IT guys, for using 250GB total upload in a month and to try and reduce it...! |
Send message Joined: 26 Jan 09 Posts: 589 Credit: 497,834,261 RAC: 0 |
Pkzip? CPDN files tend to arrive/depart in a zipped state. But when you're doing a 30MB [zipped] upload at the end of every WU you appreciate it. Bigger WUs, so that even the GPUers do less frequent trips to the fount of WUs? But as I said before, there may be other considerations... Cheers, PeterV . |
Send message Joined: 15 Jan 09 Posts: 169 Credit: 6,734,481 RAC: 0 |
A few of us have suggested larger wu in the past. I remember Travis saying that would not happen (then 2 days later the size was increased!). I still think that would be a great idea, and not just for people with GPUs. I don't mind tasks taking 12 hours, as long as they are stable and have regular checkpoints :) |
Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0 |
A few of us have suggested larger wu in the past. I remember Travis saying that would not happen (then 2 days later the size was increased!). I still think that would be a great idea, and not just for people with GPUs. For this project I like to have atleast 30 min wu's. When the project is stable, longer would be fine too. I would hate to loose credit for a few(or more) 12 hour units when the server goes off. Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. |
Send message Joined: 21 Aug 08 Posts: 625 Credit: 558,425 RAC: 0 |
A few of us have suggested larger wu in the past. I remember Travis saying that would not happen (then 2 days later the size was increased!). I still think that would be a great idea, and not just for people with GPUs. Either Dave or Travis (I think Travis) said that the current WU size for CPUs is working out for them just fine. I think what some don't realize is that future workunits are based upon the results of current workunits. This is different from projects like SETI or Einstein, where the data was recorded and is unchanging and where they just need systems to plow through chunks of the data. All this discussion really should just wait until the GPU project is up and running, then see what the situation is... |
Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0 |
A few of us have suggested larger wu in the past. I remember Travis saying that would not happen (then 2 days later the size was increased!). I still think that would be a great idea, and not just for people with GPUs. I'm not saying to change anything. The current size which runs 30 min is fine. I was just saying that longer would be ok too. Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. |
Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0 |
A few of us have suggested larger wu in the past. I remember Travis saying that would not happen (then 2 days later the size was increased!). I still think that would be a great idea, and not just for people with GPUs. Well, I think that Brian's main point is not the relative difficulty (runtime) of any given set of work on any given host. The point is that MW searches are simulations, and not rote data reduction or analysis like SAH or EAH for example. Therefore, even if the backend had unlimited capability in being able to feed work out to the client population, given the several orders in magnitude difference in processing speed between the slowest and fastest hosts you are going to reach a point where the parallel processing streams get blocked because the next step is waiting for results from slower hosts to be returned in order to determine how to proceed for the next round of calculation. It should be apparent that this is one area where the homogeneous computing capabilities of more traditional supercomputers have an advantage. If one takes the time to look over the publications, one sees that methods for working around little problems like that is one of the objectives of the Computer Science aspect of MW. Alinator |
Send message Joined: 21 Aug 08 Posts: 625 Credit: 558,425 RAC: 0 |
Right. Longer runtime tasks can actually be a hinderance rather than a help. The longer the runtime, or the deadline for that matter, the larger the impact on the search if a lot of people start turning them in towards the end of the deadline. The ideal situation would be keeping a P4 to about 30-45 minutes per, then using our "wasteful" CPU calculations to feed the GPUs so that they could do more complex calculations, leaving the simpler / mundane work for CPUs. Symbiosis...rather than fighting over who has more points or who can get work and who can't...or who should or should not be participating... |
Send message Joined: 12 Apr 08 Posts: 621 Credit: 161,934,067 RAC: 0 |
It should be apparent that this is one area where the homogeneous computing capabilities of more traditional supercomputers have an advantage. If one takes the time to look over the publications, one sees that methods for working around little problems like that is one of the objectives of the Computer Science aspect of MW. GPU Grid has similar problems in that the processing is a "stream" of tasks one building on another. I don't know how long their steams are, but the task returned by someone is re-issued as a new task to me and when I return that it is made into another task to be issued to someone else. Their approach is currently to try to pay a bonus to participants that are returning tasks within a time period that is less than the deadline. Though this is raising some angst amongst those that want to run slower cards, my own take on it is that this is a fine example of economics at work ... create an incentive to aid the project with faster returns. I don't plan to run out and replace my 9800GT with another GPU in the next 24 hours ... especially since it seems to be earning the bonus most of the time ... in fact, I think to make the most cost effective, or just the lowest cost system to gain a consistent reliable bonus for work done would be to install 2-3 9800GTs in a system ... but I digress ... |
Send message Joined: 29 Dec 07 Posts: 16 Credit: 158,120,935 RAC: 0 |
back to thread topic. "Is milkyway@home dead? " With a little casual observation, yes, it sure appears so. |
Send message Joined: 21 Aug 08 Posts: 625 Credit: 558,425 RAC: 0 |
back to thread topic. "Is milkyway@home dead? " My own casual observation is that "dead" as defined by you apparently means "no news on the GPU project". Work is still flowing, just not at sufficient enough rate to keep everyone fully occupied. This actually means that the project is "functioning at full capacity", not that it is "dead". That said, at nearly two weeks from "almost there", I think an update is due... |
Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0 |
"functioning at full capacity" "full" is actually ~23% of where it was maxed at the end of Feb. That said, at nearly two weeks from "almost there", I think an update is due... I agree. Nothing has changed on the Gpu site in almost as long. Still 'bugs' everywhere. Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. |
Send message Joined: 6 Apr 08 Posts: 2018 Credit: 100,142,856 RAC: 0 |
|
Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0 |
"functioning at full capacity" Hmmm... I don't think that came out quite the way you wanted. ;-) I'd agree that the project is running currently at ~23% less credit throughput overall than what it was before the big boo-boo. :-( @ Ice: Of course, end of the year exams and other little problems like that might be throwing a wrench into the works a bit. ;-) Alinator |
Send message Joined: 6 Apr 08 Posts: 2018 Credit: 100,142,856 RAC: 0 |
@ Ice: Exams? Good luck with them. I took one recently, the first for a long, log time. It was truely horrid, all that studying into the night and it just taking over everything for a while. But it's great to pass, yippee!, and have another certificate to file away... ;) |
©2024 Astroinformatics Group