Message boards :
Number crunching :
Not D/L WU's
Message board moderation
Author | Message |
---|---|
Send message Joined: 23 Dec 09 Posts: 4 Credit: 121,570,846 RAC: 0 |
Anyone know what this means and how I can fix it? I'm not getting any WU for almost 24 hrs. 8/20/2010 11:27:10 PM Milkyway@home Sending scheduler request: To fetch work. 8/20/2010 11:27:10 PM Milkyway@home Requesting new tasks for GPU 8/20/2010 11:27:12 PM Milkyway@home Scheduler request completed: got 0 new tasks 8/20/2010 11:27:12 PM Milkyway@home Message from server: No work sent 8/20/2010 11:27:12 PM Milkyway@home Message from server: Your app_info.xml file doesn't have a version of MilkyWay@Home N-Body Simulation. I come home and find this. Never seen this type of message before. I'm running the opti app. I've tried resetting the project without success. Thanks |
Send message Joined: 17 Oct 08 Posts: 36 Credit: 411,744 RAC: 0 |
Hi Mike, my guess is that at the time when you polled the server, it had only nbody workunits available and none for the 'normal' milkyway app. The server status page says that not much workunits are available in total currently. Even if you include a section in your app_info.xml for nbody, you could only get workunits for the CPU, since no GPU app is available for nbody up to now. That makes the messages from the server a little bit misleading because you were requesting GPU tasks and it answered you can't get nbody because you have no app for it. ;-) I see that you already have accounts for Collatz and DNETC, maybe you should crunch some of those in the meantime and simply keep polling Milkyway for GPU tasks? Regards |
Send message Joined: 23 Dec 09 Posts: 4 Credit: 121,570,846 RAC: 0 |
Hi Mike, Thanks Al, Running Dnetc while I wait for MW work. |
Send message Joined: 1 Sep 08 Posts: 520 Credit: 302,525,188 RAC: 0 |
It seems that the current status is this, the *only* GPU workunits being produced are the 'regular' or 'old style' work units. The issue here is that there is an ongoing problem (ongoing in terms of months now) where there appears to be a memory leak or some other 'creeping' problem with the validator. Every two to seven days (variable time between problems), the validator essentially gives up because of this unresolved bug. Eventually, folks at RPI become aware of the issue (perhaps less rapidly over a weekend such as now) and take actions which more often than not reset the server (or the specific processes) and then the cycle begins again. Whatever the memory leak or other bug that exists appears to be quite difficult to resolve at the root cause level and so currently the only 'fix' is a sequence of stop/start of processes or a full stop/start of the server to clear out the accumulated problems - for a time limited period -- as the underlying problem remains unsolved. I used to think it was simply the servers here expressing their displeasure at Travis being less available to them (unrequited love and all that), but I suspect that is a case of overpersonification of matters more technical than human.
|
Send message Joined: 13 Aug 10 Posts: 15 Credit: 122,278 RAC: 0 |
They could always set up a crontab to perform a reboot every 5 days or say, every Sunday at 00:00GMT. |
Send message Joined: 24 Dec 07 Posts: 1947 Credit: 240,884,648 RAC: 0 |
They could always set up a crontab to perform a reboot every 5 days or say, every Sunday at 00:00GMT. If it is a memory leak then 5 days is too long between reboots. Every 2 days should nip it in the bud. |
Send message Joined: 1 Sep 08 Posts: 520 Credit: 302,525,188 RAC: 0 |
Right, and given the time frame during which this problem has clearly existed (months, not weeks), this has been suggested more than a few times over the intervening weeks. There are some folks who feel that an automated reboot process, even as a workaround to what ought to be an obvious problem, would not resolve the problem -- even at the symptom level. There also are others who appear to believe that this problem is something else. Travis noted that he has been chasing down the memory leak (he acknowledged that there is a memory leak so that *ought* to put that debate to bed), for some time and thought the change over to the newer workunits would resolve that. At this point it hasn't and for that matter since the new workunits are CPU even as a solution it leaves a lot of GPU's hanging out there. I guess my sense is that the resistance to implementing a crontab workaround perhaps has a bit of NIH to it (as in Not Invented Here). And not implementing a workaround is fine as long as the servers are constantly attended by admins who can quickly intervene (which they are not as the admins have plenty of other things to do). The way I look at it, the disinclination to implement a workaround approach in the interim reflects the desire of decision makers around MW to insure that other projects (Collatz and Dnet) have a regular stream of GPU processing. And in this aspect, they have been fairly successful. They could always set up a crontab to perform a reboot every 5 days or say, every Sunday at 00:00GMT. |
Send message Joined: 13 Aug 10 Posts: 15 Credit: 122,278 RAC: 0 |
I agree. In fact, I've given up on this project until they can support Fermi. Every other project that supports CUDA works perfectly on my GTX 460 compiled with the old SDKs. The problem is the binary simply doesn't recognize the newer Fermi cards. The solution is simple and yet it is an ongoing problem that is wasting a TON of WUs and everyone's time, only making the problem worse. |
Send message Joined: 20 Sep 08 Posts: 1391 Credit: 203,563,566 RAC: 0 |
Well, sadly like most other ATI users, I've switched over to DNETC & Collatz, until appropriate nbody work is available. I would rather crunch here but if there is no work, or it is not stable, then I can't. I dunno what the urgency to fix that will be here. All I hope is that they haven't got to the stage where Seti has, that they can afford to have 50% out time to deal with years of results, rather than creating new work. Don't drink water, that's the stuff that rusts pipes |
Send message Joined: 13 Aug 10 Posts: 15 Credit: 122,278 RAC: 0 |
I think the problem is that there is too much emphasis on getting an OpenCL app out there instead of taking some time to make a stop-gap fix for CUDA (& newer card recognition) and AMD's API. They're going to lose a lot of resources if they don't keep up with projects that make good use of their GPU or at least compile new apps with newer APIs like CUDA 3.1. This project in particular that requires FPU64 is really shooting itself in the foot in regards to not supporting Fermi well or at all in some apps. Actually, it doesn't make much sense to use the CPU only app on this project because it is simply not well suited for it and is a waste of resources that could be much better used on other projects. |
Send message Joined: 21 Jun 10 Posts: 1 Credit: 20,468,673 RAC: 0 |
Newbie needing help here also. :( Been getting this message and no new work for the last few weeks now. 06/09/2010 03:20:57 Milkyway@home Sending scheduler request: To fetch work. 06/09/2010 03:20:57 Milkyway@home Requesting new tasks for GPU 06/09/2010 03:21:02 Milkyway@home Scheduler request completed: got 0 new tasks 06/09/2010 03:21:02 Milkyway@home Message from server: No work sent 06/09/2010 03:21:02 Milkyway@home Message from server: Your app_info.xml file doesn't have a version of MilkyWay@Home N-Body Simulation. This is my app_info.xml - <app_info> - <app> <name>milkyway</name> </app> - <file_info> <name>astronomy_0.21_x64_SSE3.exe</name> <executable /> </file_info> - <app_version> <app_name>milkyway</app_name> <version_num>21</version_num> - <file_ref> <file_name>astronomy_0.21_x64_SSE3.exe</file_name> <main_program /> </file_ref> </app_version> - <app_version> <app_name>milkyway</app_name> <version_num>20</version_num> - <file_ref> <file_name>astronomy_0.21_x64_SSE3.exe</file_name> <main_program /> </file_ref> </app_version> </app_info> I really don't have a clue why it stopped sending work to me or how to fix it so any help at all would be greatly appreciated. |
Send message Joined: 27 Aug 10 Posts: 19 Credit: 153,747,675 RAC: 0 |
I've got the same problem now :( but only on one machine. I have another machine with same spec, same graphics card and it's got loads of WU and keeps downloading more GPU WU. CPU WU is fine it's just GPU :( 07/09/2010 11:43:09 Milkyway@home Requesting new tasks for GPU 07/09/2010 11:43:15 Milkyway@home Scheduler request completed: got 0 new tasks 07/09/2010 11:43:15 Milkyway@home Message from server: No work sent 07/09/2010 11:43:15 Milkyway@home Message from server: (reached limit of 24 tasks in progress) |
Send message Joined: 27 Aug 10 Posts: 19 Credit: 153,747,675 RAC: 0 |
was driver problem. fixed now. |
©2024 Astroinformatics Group