Welcome to MilkyWay@home

Not D/L WU's


Advanced search

Message boards : Number crunching : Not D/L WU's
Message board moderation

To post messages, you must log in.

AuthorMessage
ProfileMike029.SETI.USA [BlackOps]

Send message
Joined: 23 Dec 09
Posts: 4
Credit: 121,570,846
RAC: 0
100 million credit badge10 year member badge
Message 41635 - Posted: 21 Aug 2010, 14:16:23 UTC

Anyone know what this means and how I can fix it?
I'm not getting any WU for almost 24 hrs.

8/20/2010 11:27:10 PM Milkyway@home Sending scheduler request: To fetch work.
8/20/2010 11:27:10 PM Milkyway@home Requesting new tasks for GPU
8/20/2010 11:27:12 PM Milkyway@home Scheduler request completed: got 0 new tasks
8/20/2010 11:27:12 PM Milkyway@home Message from server: No work sent
8/20/2010 11:27:12 PM Milkyway@home Message from server: Your app_info.xml file doesn't have a version of MilkyWay@Home N-Body Simulation.

I come home and find this. Never seen this type of message before. I'm running the opti app. I've tried resetting the project without success.

Thanks
ID: 41635 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
(retired account)
Avatar

Send message
Joined: 17 Oct 08
Posts: 36
Credit: 411,744
RAC: 0
100 thousand credit badge10 year member badge
Message 41640 - Posted: 21 Aug 2010, 15:46:53 UTC

Hi Mike,

my guess is that at the time when you polled the server, it had only nbody workunits available and none for the 'normal' milkyway app. The server status page says that not much workunits are available in total currently.

Even if you include a section in your app_info.xml for nbody, you could only get workunits for the CPU, since no GPU app is available for nbody up to now. That makes the messages from the server a little bit misleading because you were requesting GPU tasks and it answered you can't get nbody because you have no app for it. ;-)

I see that you already have accounts for Collatz and DNETC, maybe you should crunch some of those in the meantime and simply keep polling Milkyway for GPU tasks?

Regards
ID: 41640 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileMike029.SETI.USA [BlackOps]

Send message
Joined: 23 Dec 09
Posts: 4
Credit: 121,570,846
RAC: 0
100 million credit badge10 year member badge
Message 41643 - Posted: 21 Aug 2010, 16:05:19 UTC - in response to Message 41640.  

Hi Mike,

my guess is that at the time when you polled the server, it had only nbody workunits available and none for the 'normal' milkyway app. The server status page says that not much workunits are available in total currently.

Even if you include a section in your app_info.xml for nbody, you could only get workunits for the CPU, since no GPU app is available for nbody up to now. That makes the messages from the server a little bit misleading because you were requesting GPU tasks and it answered you can't get nbody because you have no app for it. ;-)

I see that you already have accounts for Collatz and DNETC, maybe you should crunch some of those in the meantime and simply keep polling Milkyway for GPU tasks?

Regards

Thanks Al, Running Dnetc while I wait for MW work.
ID: 41643 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
BarryAZ

Send message
Joined: 1 Sep 08
Posts: 519
Credit: 283,151,416
RAC: 943
200 million credit badge10 year member badgeextraordinary contributions badge
Message 41645 - Posted: 21 Aug 2010, 18:24:15 UTC - in response to Message 41643.  

It seems that the current status is this, the *only* GPU workunits being produced are the 'regular' or 'old style' work units. The issue here is that there is an ongoing problem (ongoing in terms of months now) where there appears to be a memory leak or some other 'creeping' problem with the validator. Every two to seven days (variable time between problems), the validator essentially gives up because of this unresolved bug. Eventually, folks at RPI become aware of the issue (perhaps less rapidly over a weekend such as now) and take actions which more often than not reset the server (or the specific processes) and then the cycle begins again.

Whatever the memory leak or other bug that exists appears to be quite difficult to resolve at the root cause level and so currently the only 'fix' is a sequence of stop/start of processes or a full stop/start of the server to clear out the accumulated problems - for a time limited period -- as the underlying problem remains unsolved.

I used to think it was simply the servers here expressing their displeasure at Travis being less available to them (unrequited love and all that), but I suspect that is a case of overpersonification of matters more technical than human.



Thanks Al, Running Dnetc while I wait for MW work.


ID: 41645 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mutiny32*

Send message
Joined: 13 Aug 10
Posts: 15
Credit: 122,278
RAC: 0
100 thousand credit badge9 year member badge
Message 41647 - Posted: 21 Aug 2010, 18:35:13 UTC - in response to Message 41645.  

They could always set up a crontab to perform a reboot every 5 days or say, every Sunday at 00:00GMT.
ID: 41647 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileThe Gas Giant
Avatar

Send message
Joined: 24 Dec 07
Posts: 1947
Credit: 240,884,648
RAC: 0
200 million credit badge10 year member badge
Message 41649 - Posted: 21 Aug 2010, 19:45:49 UTC - in response to Message 41647.  

They could always set up a crontab to perform a reboot every 5 days or say, every Sunday at 00:00GMT.

If it is a memory leak then 5 days is too long between reboots. Every 2 days should nip it in the bud.
ID: 41649 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
BarryAZ

Send message
Joined: 1 Sep 08
Posts: 519
Credit: 283,151,416
RAC: 943
200 million credit badge10 year member badgeextraordinary contributions badge
Message 41654 - Posted: 21 Aug 2010, 21:28:14 UTC - in response to Message 41647.  

Right, and given the time frame during which this problem has clearly existed (months, not weeks), this has been suggested more than a few times over the intervening weeks. There are some folks who feel that an automated reboot process, even as a workaround to what ought to be an obvious problem, would not resolve the problem -- even at the symptom level.

There also are others who appear to believe that this problem is something else.

Travis noted that he has been chasing down the memory leak (he acknowledged that there is a memory leak so that *ought* to put that debate to bed), for some time and thought the change over to the newer workunits would resolve that. At this point it hasn't and for that matter since the new workunits are CPU even as a solution it leaves a lot of GPU's hanging out there.

I guess my sense is that the resistance to implementing a crontab workaround perhaps has a bit of NIH to it (as in Not Invented Here). And not implementing a workaround is fine as long as the servers are constantly attended by admins who can quickly intervene (which they are not as the admins have plenty of other things to do).

The way I look at it, the disinclination to implement a workaround approach in the interim reflects the desire of decision makers around MW to insure that other projects (Collatz and Dnet) have a regular stream of GPU processing. And in this aspect, they have been fairly successful.



They could always set up a crontab to perform a reboot every 5 days or say, every Sunday at 00:00GMT.


ID: 41654 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mutiny32*

Send message
Joined: 13 Aug 10
Posts: 15
Credit: 122,278
RAC: 0
100 thousand credit badge9 year member badge
Message 41660 - Posted: 22 Aug 2010, 3:37:55 UTC - in response to Message 41654.  

I agree. In fact, I've given up on this project until they can support Fermi. Every other project that supports CUDA works perfectly on my GTX 460 compiled with the old SDKs. The problem is the binary simply doesn't recognize the newer Fermi cards. The solution is simple and yet it is an ongoing problem that is wasting a TON of WUs and everyone's time, only making the problem worse.
ID: 41660 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Chris S
Avatar

Send message
Joined: 20 Sep 08
Posts: 1387
Credit: 186,726,858
RAC: 0
100 million credit badge10 year member badge
Message 41665 - Posted: 22 Aug 2010, 9:03:18 UTC

Well, sadly like most other ATI users, I've switched over to DNETC & Collatz, until appropriate nbody work is available. I would rather crunch here but if there is no work, or it is not stable, then I can't. I dunno what the urgency to fix that will be here.

All I hope is that they haven't got to the stage where Seti has, that they can afford to have 50% out time to deal with years of results, rather than creating new work.

Don't drink water, that's the stuff that rusts pipes
ID: 41665 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mutiny32*

Send message
Joined: 13 Aug 10
Posts: 15
Credit: 122,278
RAC: 0
100 thousand credit badge9 year member badge
Message 41760 - Posted: 26 Aug 2010, 17:56:11 UTC

I think the problem is that there is too much emphasis on getting an OpenCL app out there instead of taking some time to make a stop-gap fix for CUDA (& newer card recognition) and AMD's API. They're going to lose a lot of resources if they don't keep up with projects that make good use of their GPU or at least compile new apps with newer APIs like CUDA 3.1.

This project in particular that requires FPU64 is really shooting itself in the foot in regards to not supporting Fermi well or at all in some apps. Actually, it doesn't make much sense to use the CPU only app on this project because it is simply not well suited for it and is a waste of resources that could be much better used on other projects.
ID: 41760 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
CeeGee

Send message
Joined: 21 Jun 10
Posts: 1
Credit: 20,468,673
RAC: 0
20 million credit badge9 year member badge
Message 41949 - Posted: 6 Sep 2010, 2:32:05 UTC

Newbie needing help here also. :(

Been getting this message and no new work for the last few weeks now.

06/09/2010 03:20:57	Milkyway@home	Sending scheduler request: To fetch work.
06/09/2010 03:20:57	Milkyway@home	Requesting new tasks for GPU
06/09/2010 03:21:02	Milkyway@home	Scheduler request completed: got 0 new tasks
06/09/2010 03:21:02	Milkyway@home	Message from server: No work sent
06/09/2010 03:21:02	Milkyway@home	Message from server: Your app_info.xml file doesn't have a version of MilkyWay@Home N-Body Simulation.



This is my app_info.xml

- <app_info>
- <app>
  <name>milkyway</name> 
  </app>
- <file_info>
  <name>astronomy_0.21_x64_SSE3.exe</name> 
  <executable /> 
  </file_info>
- <app_version>
  <app_name>milkyway</app_name> 
  <version_num>21</version_num> 
- <file_ref>
  <file_name>astronomy_0.21_x64_SSE3.exe</file_name> 
  <main_program /> 
  </file_ref>
  </app_version>
- <app_version>
  <app_name>milkyway</app_name> 
  <version_num>20</version_num> 
- <file_ref>
  <file_name>astronomy_0.21_x64_SSE3.exe</file_name> 
  <main_program /> 
  </file_ref>
  </app_version>
  </app_info>


I really don't have a clue why it stopped sending work to me or how to fix it so any help at all would be greatly appreciated.

ID: 41949 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Andy

Send message
Joined: 27 Aug 10
Posts: 19
Credit: 152,436,602
RAC: 2,298
100 million credit badge9 year member badge
Message 41968 - Posted: 7 Sep 2010, 10:48:19 UTC - in response to Message 41949.  

I've got the same problem now :( but only on one machine. I have another machine with same spec, same graphics card and it's got loads of WU and keeps downloading more GPU WU.

CPU WU is fine it's just GPU :(

07/09/2010 11:43:09 Milkyway@home Requesting new tasks for GPU
07/09/2010 11:43:15 Milkyway@home Scheduler request completed: got 0 new tasks
07/09/2010 11:43:15 Milkyway@home Message from server: No work sent
07/09/2010 11:43:15 Milkyway@home Message from server: (reached limit of 24 tasks in progress)
ID: 41968 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Andy

Send message
Joined: 27 Aug 10
Posts: 19
Credit: 152,436,602
RAC: 2,298
100 million credit badge9 year member badge
Message 41999 - Posted: 8 Sep 2010, 16:25:39 UTC - in response to Message 41968.  

was driver problem. fixed now.
ID: 41999 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Not D/L WU's

©2020 Astroinformatics Group