Message boards :
News :
maximum time limit elapsed bug
Message board moderation
Previous · 1 . . . 4 · 5 · 6 · 7
Author | Message |
---|---|
Send message Joined: 15 Jul 11 Posts: 14 Credit: 5,978,191 RAC: 0 |
I know Link which is why I did this test to explain what happens if you do not have the app_info and you did not follow your solution. I also checked the client_state difference with seti and mikly: milky <rsc_fpops_est> 25262095395789.531000</rsc_fpops_est> <rsc_fpops_bound>2526209539578953.000000</rsc_fpops_bound> seti <rsc_fpops_est> 8707547718264.028300</rsc_fpops_est> <rsc_fpops_bound>500000000000000000.000000</rsc_fpops_bound> And i think that in milky the values translate in a larger app speed estimate than the WU size. So they have to increase the WU size estimate (not to happy with that) or decrease the app speed estimate (which would be the same as with app_info). |
Send message Joined: 4 Oct 08 Posts: 1734 Credit: 64,228,409 RAC: 0 |
Got the 0.82 GPU client to crunch OK and validate - 4 WUs concerned. This was after removing the double M from the app_info file client use command. But, it seemed to cause 2 problems - (a) the GPU load was at 98%-99% on a single WU, and slowed the graphics to an unbearable crawl; (b) the NCI CPU work I am crunching keeps on resetting, and that only happened after <iklway was reinstated. I am getting BM 6.12.33 to replace the 6.12.22 I currently run to see if things improve. But, I have run 6.12.22 for months without problems. So, as far as I am concerned, 6.12.22 seems to be virtually bug free and stable. That is until the new MW GPU client stressed it. Go away, I was asleep |
Send message Joined: 19 Jul 10 Posts: 624 Credit: 19,290,347 RAC: 2,077 |
seti Where did you get that from? All my SETI WUs has rsc_fpops_bound = 10x rsc_fpops_est, and that's what it's supposed to be. Milkyway has 100x rsc_fpops_est, which is VERY high compared to other projects, a CPU task which should be completed within for example 10 hours would run 1000 hours if it gets stuck. I know Link which is why I did this test to explain what happens if you do not have the app_info and you did not follow your solution. What happens without an app_info is that usually everything works as it should, just on a very small procentage of all computers there is a problem. The problem on all machines I have seen until now was a wrong app speed estimate on new machines, where new means new app version (hence it started for many with 0.82) or somehow reseted app information on the server like the one I posted in message 50228. |
Send message Joined: 15 Jul 08 Posts: 383 Credit: 729,293,740 RAC: 0 |
But, it seemed to cause 2 problems - (a) the GPU load was at 98%-99% on a single WU, and slowed the graphics to an unbearable crawl; (b) the NCI CPU work I am crunching keeps on resetting, and that only happened after <iklway was reinstated. Try this command in your app_info.xml: <cmdline>--gpu-target-frequency 55 </cmdline> Increase the number for less graphics lag. Default is 35. Try 6.12.33 for the 2nd issue as various earlier 6.12.xx versions had inordinate task switching problems. |
Send message Joined: 15 Jul 11 Posts: 14 Credit: 5,978,191 RAC: 0 |
seti As mentioned from client_state file from one of my machines. But is doesn't really matter. And I don't know why the values have such a big difference. Could be because I replace my 2x5770 with a 6950. When I started a week ago I checked the properties of a running WU and there the app speed estimate was greater then the WU size estimate hence the problem. Which was fixed by creating a app_info file. |
©2024 Astroinformatics Group