maximum time limit elapsed bug

Author	Message
S@NL - EStorm Send message Joined: 15 Jul 11 Posts: 14 Credit: 5,978,191 RAC: 0	Message 50324 - Posted: 21 Jul 2011, 14:24:46 UTC - in response to Message 50322. Last modified: 21 Jul 2011, 14:27:25 UTC I know Link which is why I did this test to explain what happens if you do not have the app_info and you did not follow your solution. I also checked the client_state difference with seti and mikly: milky <rsc_fpops_est> 25262095395789.531000</rsc_fpops_est> <rsc_fpops_bound>2526209539578953.000000</rsc_fpops_bound> seti <rsc_fpops_est> 8707547718264.028300</rsc_fpops_est> <rsc_fpops_bound>500000000000000000.000000</rsc_fpops_bound> And i think that in milky the values translate in a larger app speed estimate than the WU size. So they have to increase the WU size estimate (not to happy with that) or decrease the app speed estimate (which would be the same as with app_info). ID: 50324 · Rating: 0 · rate: / Reply Quote

John Clark Send message Joined: 4 Oct 08 Posts: 1734 Credit: 64,228,409 RAC: 0	Message 50325 - Posted: 21 Jul 2011, 14:44:07 UTC Got the 0.82 GPU client to crunch OK and validate - 4 WUs concerned. This was after removing the double M from the app_info file client use command. But, it seemed to cause 2 problems - (a) the GPU load was at 98%-99% on a single WU, and slowed the graphics to an unbearable crawl; (b) the NCI CPU work I am crunching keeps on resetting, and that only happened after <iklway was reinstated. I am getting BM 6.12.33 to replace the 6.12.22 I currently run to see if things improve. But, I have run 6.12.22 for months without problems. So, as far as I am concerned, 6.12.22 seems to be virtually bug free and stable. That is until the new MW GPU client stressed it. Go away, I was asleep ID: 50325 · Rating: 0 · rate: / Reply Quote

Link Send message Joined: 19 Jul 10 Posts: 828 Credit: 21,717,380 RAC: 6,448	Message 50329 - Posted: 21 Jul 2011, 16:38:17 UTC - in response to Message 50324. seti <rsc_fpops_est>8707547718264.028300</rsc_fpops_est> <rsc_fpops_bound>500000000000000000.000000</rsc_fpops_bound> Where did you get that from? All my SETI WUs has rsc_fpops_bound = 10x rsc_fpops_est, and that's what it's supposed to be. Milkyway has 100x rsc_fpops_est, which is VERY high compared to other projects, a CPU task which should be completed within for example 10 hours would run 1000 hours if it gets stuck. I know Link which is why I did this test to explain what happens if you do not have the app_info and you did not follow your solution. (...) And i think that in milky the values translate in a larger app speed estimate than the WU size. So they have to increase the WU size estimate (not to happy with that) or decrease the app speed estimate (which would be the same as with app_info). What happens without an app_info is that usually everything works as it should, just on a very small procentage of all computers there is a problem. The problem on all machines I have seen until now was a wrong app speed estimate on new machines, where new means new app version (hence it started for many with 0.82) or somehow reseted app information on the server like the one I posted in message 50228. ID: 50329 · Rating: 0 · rate: / Reply Quote

Beyond Send message Joined: 15 Jul 08 Posts: 384 Credit: 743,535,976 RAC: 39,190	Message 50330 - Posted: 21 Jul 2011, 16:41:12 UTC - in response to Message 50325. But, it seemed to cause 2 problems - (a) the GPU load was at 98%-99% on a single WU, and slowed the graphics to an unbearable crawl; (b) the NCI CPU work I am crunching keeps on resetting, and that only happened after <iklway was reinstated. Try this command in your app_info.xml: <cmdline>--gpu-target-frequency 55 </cmdline> Increase the number for less graphics lag. Default is 35. Try 6.12.33 for the 2nd issue as various earlier 6.12.xx versions had inordinate task switching problems. ID: 50330 · Rating: 0 · rate: / Reply Quote

S@NL - EStorm Send message Joined: 15 Jul 11 Posts: 14 Credit: 5,978,191 RAC: 0	Message 50331 - Posted: 21 Jul 2011, 17:06:26 UTC - in response to Message 50329. seti <rsc_fpops_est>8707547718264.028300</rsc_fpops_est> <rsc_fpops_bound>500000000000000000.000000</rsc_fpops_bound> Where did you get that from? All my SETI WUs has rsc_fpops_bound = 10x rsc_fpops_est, and that's what it's supposed to be. Milkyway has 100x rsc_fpops_est, which is VERY high compared to other projects, a CPU task which should be completed within for example 10 hours would run 1000 hours if it gets stuck. As mentioned from client_state file from one of my machines. But is doesn't really matter. And I don't know why the values have such a big difference. Could be because I replace my 2x5770 with a 6950. When I started a week ago I checked the properties of a running WU and there the app speed estimate was greater then the WU size estimate hence the problem. Which was fixed by creating a app_info file. ID: 50331 · Rating: 0 · rate: / Reply Quote